Fix 3-way deadlock between moveChunk-setAllowChunkOperations-splitChunks

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2026-06-22
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      This deadlock has been caused by a combination of changes under the Authoritative Shards project (featureFlagAuthoritativeShardsDDL).
       
      The test hangs because a chunk migration is stuck, its session migration destination thread cannot check out a session that is already held by ShardsvrSetAllowChunkOperationsCommand. That command in turn waits for a SplitChunkCoordinator to complete, but the coordinator cannot register itself in ActiveMigrationsRegistry because the very migration that is stuck holds the receive-chunk slot.

      3-way deadlock on the recipient shard of a migration:

      Chunk migration (Needs to checkout a 'migrated' session id) -> SetAllowChunkOperations (has checked out the same session id, needs to wait for split to finish) -> Split (need to acquire an ActiveMigrationsRegistry slot) -> Chunk Migration (holds the ActiveMigrationsRegistry slot)

            Assignee:
            Aitor Esteve Alvarado
            Reporter:
            Silvia Surroca
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: