Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67385

Range deletion tasks may be wrongly scheduled before ongoing queries on range finish on a shard primary

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 5.0.14, 6.0.2, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v6.0, v5.0
    • Sharding EMEA 2022-08-22

      The CollectionShardingRuntime for a sharded collection is keeping a reference to a MetadataManager that is responsible to keep track of a list of open cursors to know how many queries are running using different filtering metadata for the collection at different points in time.

      On a shard primary node, such list is iterated when having to schedule a range deletion task in order to determine whether it is needed to wait for running queries or the task can be "safely" scheduled after orphanCleanupDelaySecs because there are no queries acting on the orphan range.

      However, when filtering metadata are cleared up, the CollectionShardingRuntime is loosing track of previous metadata managers. This means that the range-deleter may not honor the promise that all running queries on the shard primary have been completed before starting deleting documents from an orphaned range.

      It follows that the following claim from the documentation has always been incorrect: "Before deleting the chunk during chunk migration, MongoDB waits for orphanCleanupDelaySecs or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer".

      This bug can be traced back to v5.0.

            Assignee:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: