Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.0, 4.0.0, 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0, 8.0.0-rc0
Component/s: Sharding
Labels:
None

Assigned Teams:

Catalog and Routing
Operating System:
ALL
Sprint:
CAR Team 2024-03-18, CAR Team 2024-04-01
Linked BF Score:
0

The shard version protocol guarantees that a query will see each document from the shard which at the very beginning of query execution originally owned the document and the query won't see the same document from other shards even if the chunk range is later migrated to them. This means a query in a sharded cluster won't ever return the same document twice.

However, range deletion will delete the stale copy of the document from the donor shard 15 minutes (default value for orphanCleanupDelaySecs server parameter) after the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. This means a query in a sharded cluster may return incomplete results in the following situations:

Query runs on a secondary for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
Query begins running on a primary and the primary steps down. Query then runs on the former primary, now secondary, for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
Query runs on a secondary for any amount of time and a chunk migration is run with _waitForDelete == true either manually or by the balancer. Setting the _waitForDelete option to true results in range deletion deleting the stale copy of the document from the donor shard without waiting for 15 minutes (orphanCleanupDelaySecs). Instead the range deleter only waits until the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. The _waitForDelete option is documented as only being meant for internal testing purposes though.
- https://www.mongodb.com/docs/manual/reference/command/moveChunk/
- https://www.mongodb.com/docs/manual/tutorial/manage-sharded-cluster-balancer/#wait-for-delete

has to be done before

SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`

Needs Scheduling

is related to

SERVER-67688 notifySecondariesThatDeletionIsOccurring is not notifying secondaries

Closed

SERVER-77354 Increase the value of orphanCleanupDelaySecs for concurrency_sharded_causal_consistency_and_balancer

Closed

SERVER-29405 After move chunk out, pause for secondary queries to drain

Closed

DOCS-10446 Docs for SERVER-29405: After move chunk out, pause for secondary queries to drain

Closed

SERVER-68352 Only wait for `orphanCleanupDelaySecs` before allowing range deletion to start

Closed

related to

SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`

Needs Scheduling

(1 is related to, 1 related to)

Assignee:: Unassigned

Reporter:: Max Hirschhorn

Participants:: Max Hirschhorn

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Created:: Mar 08 2024 04:37:04 AM UTC

Updated:: May 10 2024 03:19:46 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates