Orphans accumulate after transitionToDedicatedConfigServer races with migration commit to drop config.rangeDeletions

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • ClusterScalability 27Apr-11May
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The BF is caught by a post-test cleanup hook that asserts no orphan documents exist on any shard. The orphans are produced by a race between a chunk migration committing and transitionToDedicatedConfigServer running concurrently.

      During migration commit, the donor shard registers an in-memory range deletion task in a pending state, then immediately calls markAsReadyRangeDeletionTaskLocally to remove the pending field from the corresponding on-disk doc in config.rangeDeletions. Removing that field triggers an op observer that calls clearPending(), unblocking the deletion chain. If transitionToDedicatedConfigServer (introduced in SERVER-103990) has already dropped config.rangeDeletions before markAsReadyRangeDeletionTaskLocally runs, the update throws NoMatchingDocument, the catch block does nothing, and clearPending() is never called. The in-memory deletion chain stalls indefinitely and the orphan documents on the former shard are never cleaned up.

      A secondary consequence applies to the migration that is actively in flight during the race: if that moveChunk invocation was issued with waitForDelete=true (either directly by a user or via balancer configuration), the command blocks on the stuck completion future with no deadline and hangs until the operation context is killed by a client timeout, stepdown, or shutdown.

            Assignee:
            Rehan Gill
            Reporter:
            Rehan Gill
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: