Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87673

Queries which run on secondaries and exceed orphanCleanupDelaySecs may miss documents which were donated by chunk migration

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.0, 4.0.0, 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0, 8.0.0-rc0
    • Component/s: Sharding
    • Labels:
      None
    • Catalog and Routing
    • ALL
    • CAR Team 2024-03-18, CAR Team 2024-04-01
    • 0

      The shard version protocol guarantees that a query will see each document from the shard which at the very beginning of query execution originally owned the document and the query won't see the same document from other shards even if the chunk range is later migrated to them. This means a query in a sharded cluster won't ever return the same document twice.

      However, range deletion will delete the stale copy of the document from the donor shard 15 minutes (default value for orphanCleanupDelaySecs server parameter) after the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. This means a query in a sharded cluster may return incomplete results in the following situations:

      • Query runs on a secondary for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
      • Query begins running on a primary and the primary steps down. Query then runs on the former primary, now secondary, for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started.
      • Query runs on a secondary for any amount of time and a chunk migration is run with _waitForDelete == true either manually or by the balancer. Setting the _waitForDelete option to true results in range deletion deleting the stale copy of the document from the donor shard without waiting for 15 minutes (orphanCleanupDelaySecs). Instead the range deleter only waits until the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. The _waitForDelete option is documented as only being meant for internal testing purposes though.

            Assignee:
            Unassigned Unassigned
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated: