[SERVER-76561] Lock-free acquisitions can read from sharded collection as unsharded when collection is dropped and recreated (ABA problem) Created: 26/Apr/23  Updated: 26/Oct/23

Status: Blocked
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gomez Ferro Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-62457 Lock-free reads causes query subsyste... Closed
is related to SERVER-76559 Lock-free reads can read from sharded... Closed
Assigned Teams:
Catalog and Routing
Operating System: ALL
Participants:

 Description   

This is hypothetical, we haven't tried to reproduce it.

The scenario looks like this:

  1. Collection exists and is unsharded.
  2. Mongos attaches shard version UNSHARDED to the request.
  3. acquireCollectionWithoutLocks checksShardingPlacement, it is correct.
  4. Collection becomes sharded from some other client running shardCollection.
  5. acquireCollectionWithoutLocks opens the snapshot at this point.
  6. Collection is dropped from some other client running drop.
  7. Collection is created again as unsharded from some other client running create.
  8. acquireCollectionWithoutLocks checksShardingPlacement again and everything checks out because the request was sent with shard version UNSHARDED and the collection is (again now) unsharded.

Currently we have a check to prevent this exact scenario described in SERVER-62457 in the AutoGetCollectionForLockFreeReads case, however it might be broken, see SERVER-76559.



 Comments   
Comment by Daniel Gomez Ferro [ 13/Jun/23 ]

We think this is too difficult to solve before PM-3364, and we can't run into the issue describe in SERVER-62457 because we son't report the collection as being sharded unless we request it to be sharded.

Closing this ticket.

Comment by Daniel Gomez Ferro [ 26/Apr/23 ]

One possible solution is to get the collection UUID before the first shardingPlacementCheck and checking it again after opening the snapshot to verify we are dealing with the same collection throughout the execution of the acquisition.

Generated at Thu Feb 08 06:33:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.