[SERVER-43848] find/update/delete w/o shard key predicate under txn with snapshot read can miss documents Created: 04/Oct/19  Updated: 29/Oct/23  Resolved: 04/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.0
Fix Version/s: 4.2.6, 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Sharding 2020-03-09
Participants:
Linked BF Score: 7

 Description   

Scenario:

shardKey: x: 1
chunks: [MinKey, 0) @ shard1, [0, MaxKey) @ shard0

1. Txn sets read concern timestamp to t5.
2. Migration move document x: 1, y: 1, from shard0 (last chunk) to shard1 at t10.
3. Mongos refreshes to latest chunk metadata.
4. Txn targets update/delete with predicate y: 1. This will generate an index bound of [MinKey, MaxKey).
5. ChunkManager::getShardIdsForRange will go through every chunk that overlaps with [MinKey, MaxKey) and get the shardId at t5.
6. However, the loop has an optimization to early exit if the number of shards that should be targeted is equal to the shard version map. This will cause the loop to exit early and cause the write to target only shard1.

The issue here is that the shard version map only include shards with chunks and represents the mapping at t10 and not t5. In the case above, there were 2 shards that had chunks at t5, but only 1 shard that had chunks at t10. Even though the document is currently in shard1, the update/remove will not see it because it is running under the snapshot with ts = t5.



 Comments   
Comment by Githook User [ 08/Apr/20 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-43848 find/update/delete w/o shard key predicate under txn with snapshot read can miss documents

(cherry picked from commit 305bbe0ed709ffc88916093cfbd716bfb8fea60b)
Branch: v4.2
https://github.com/mongodb/mongo/commit/72e265eccbb6e49497eb386a0829ed75bae8c032

Comment by Githook User [ 26/Mar/20 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-43848 find/update/delete w/o shard key predicate under txn with snapshot read can miss documents

(cherry picked from commit 305bbe0ed709ffc88916093cfbd716bfb8fea60b)
Branch: v4.4
https://github.com/mongodb/mongo/commit/aeb7bc16841ba15794e0b5615feec0f01c494711

Comment by Githook User [ 04/Mar/20 ]

Author:

{'username': 'renctan', 'name': 'Randolph Tan', 'email': 'randolph@10gen.com'}

Message: SERVER-43848 find/update/delete w/o shard key predicate under txn with snapshot read can miss documents
Branch: master
https://github.com/mongodb/mongo/commit/305bbe0ed709ffc88916093cfbd716bfb8fea60b

Comment by Randolph Tan [ 09/Oct/19 ]

I think you're right. Updated title.

Comment by Jack Mulrow [ 09/Oct/19 ]

I think this affects reads too, since they also use ChunkManager::getShardIdsForRange() when targeting shards (through the getTargetedShardsForQuery() helper).

Generated at Thu Feb 08 05:04:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.