-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.0, 4.0.0, 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0, 8.0.0-rc0
-
Component/s: Sharding
-
None
-
Catalog and Routing
-
ALL
-
CAR Team 2024-03-18, CAR Team 2024-04-01
-
(copied to CRM)
-
0
The shard version protocol guarantees that a query will see each document from the shard which at the very beginning of query execution originally owned the document and the query won't see the same document from other shards even if the chunk range is later migrated to them. This means a query in a sharded cluster won't ever return the same document twice.
However, range deletion will delete the stale copy of the document from the donor shard 15 minutes (default value for orphanCleanupDelaySecs server parameter) after the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. This means a query in a sharded cluster may return incomplete results in the following situations:
- Query runs on a secondary for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
- Query begins running on a primary and the primary steps down. Query then runs on the former primary, now secondary, for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
- Query runs on a secondary for any amount of time and a chunk migration is run with _waitForDelete == true either manually or by the balancer. Setting the _waitForDelete option to true results in range deletion deleting the stale copy of the document from the donor shard without waiting for 15 minutes (orphanCleanupDelaySecs). Instead the range deleter only waits until the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. The _waitForDelete option is documented as only being meant for internal testing purposes though.
- is depended on by
-
SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`
- Blocked
- is related to
-
SERVER-67688 notifySecondariesThatDeletionIsOccurring is not notifying secondaries
- Closed
-
SERVER-77354 Increase the value of orphanCleanupDelaySecs for concurrency_sharded_causal_consistency_and_balancer
- Closed
-
SERVER-29405 After move chunk out, pause for secondary queries to drain
- Closed
-
SERVER-68352 Only wait for `orphanCleanupDelaySecs` before allowing range deletion to start
- Closed
- related to
-
SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`
- Blocked