Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0
-
None
-
Catalog and Routing
-
Fully Compatible
-
v7.2, v7.0, v6.0, v5.0, v4.4
-
CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-01-22, CAR Team 2024-02-05
-
101
-
3
Description
Consider a multi-document transaction with readConcern=snapshot (without atClusterTIme provided by the client) involving an unsharded collection, and the following interleaving:
1. Mongos chooses the 'atClusterTime' at which the transaction will run. Let's say it choses TS100.
2. Concurrently, a movePrimary executes. The recipient finishes cloning documents at TS200, and the operation commits at TS210.
3. MovePrimary finishes and mongos becomes aware of the new db-primary shard.
4. Now mongos proceed with routing the transaction statement to the new primary, but with atClusterTime=TS100.
5. On the shard, the databaseVersion check will pass, but the transaction will execute with a data snapshot @TS100, so it won't see the documents.
This can cause reads to not see the expected data, and writes to not modify the expected documents.
Edit: A similar bug can occur with readConcerns other than snapshot. For instance, consider initially shard1 owns dbA, and shard2 owns dbB:
1. Mongos targets a first transaction statement for dbA to shard1. This opens a snapshot at T100 on that shard.
2. MovePrimary moves dbB to shard1, which commits at T200.
3. Mongos targets a second statement for dbB to shard1. DatabaseVersion check passes, but the snapshot used by the transaction on shard1 does not contain the expected data for dbB.
Attachments
Issue Links
- related to
-
SERVER-77506 Sharded multi-document transactions can mismatch data and ShardVersion
-
- Closed
-