Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.0, 4.0.0, 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0, 8.0.0-rc0
Component/s: Sharding
Labels:
None

Assigned Teams:

Catalog and Routing
Operating System:
ALL
Sprint:
CAR Team 2024-03-18, CAR Team 2024-04-01
Case:
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The shard version protocol guarantees that a query will see each document from the shard which at the very beginning of query execution originally owned the document and the query won't see the same document from other shards even if the chunk range is later migrated to them. This means a query in a sharded cluster won't ever return the same document twice.

However, range deletion will delete the stale copy of the document from the donor shard 15 minutes (default value for orphanCleanupDelaySecs server parameter) after the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. This means a query in a sharded cluster may return incomplete results in the following situations:

Query runs on a secondary for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
Query begins running on a primary and the primary steps down. Query then runs on the former primary, now secondary, for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
Query runs on a secondary for any amount of time and a chunk migration is run with _waitForDelete == true either manually or by the balancer. Setting the _waitForDelete option to true results in range deletion deleting the stale copy of the document from the donor shard without waiting for 15 minutes (orphanCleanupDelaySecs). Instead the range deleter only waits until the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. The _waitForDelete option is documented as only being meant for internal testing purposes though.
- https://www.mongodb.com/docs/manual/reference/command/moveChunk/
- https://www.mongodb.com/docs/manual/tutorial/manage-sharded-cluster-balancer/#wait-for-delete

is depended on by

SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`

Closed

is duplicated by

SERVER-100158 Kill a query if the CollectionMetadataTracker has been invalidated

Closed

is fixed by

SERVER-100158 Kill a query if the CollectionMetadataTracker has been invalidated

Closed

is related to

SERVER-67688 notifySecondariesThatDeletionIsOccurring is not notifying secondaries

Closed

SERVER-77354 Increase the value of orphanCleanupDelaySecs for concurrency_sharded_causal_consistency_and_balancer

Closed

SERVER-29405 After move chunk out, pause for secondary queries to drain

Closed

SERVER-68352 Only wait for `orphanCleanupDelaySecs` before allowing range deletion to start

Closed

related to

SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`

Closed

(2 is related to, 1 related to)

Assignee:: Silvia Surroca
Reporter:: Max Hirschhorn
Participants:: Max Hirschhorn, Silvia Surroca
Votes:: 0 Vote for this issue
Watchers:: 37 Start watching this issue

Created:: Mar 08 2024 04:37:04 AM UTC
Updated:: Apr 02 2025 08:59:41 AM UTC
Resolved:: Apr 02 2025 08:59:41 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates