Deadlock between FCV upgrade of authoritative shards and removeShard

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0
    • Component/s: None
    • None
    • Catalog and Routing
    • CAR Team 2025-11-10
    • 2
    • 馃煩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      When enabling the authoritative shards feature as part of an FCV upgrade, setFCV will tell every shards to clone the authoritative metadata.聽 To do this, it first grabs the shard membership lock to get a stable shard list, then it sends a command to each shard to start a DDL coordinator.

      This is problematic because removeShard, which can run concurrently to an FCV upgrade, first blocks DDL coordinators, then it grabs the shard membership lock, in effect locking in the opposite order. Thus, they can interleave such that both setFCV and removeShard deadlock.

      A reproducer is attached. This issue exists with both the legacy removeShard (upgrade from v8.0 to v8.3), and the new removeShard coordinator (upgrade from v8.2 to v8.3). It could also be a problem with other FCV upgrade features if they spawn DDL coordinators similarly.

        1. repro-SERVER-112610.patch
          7 kB
          Joan Bruguera Mic贸

            Assignee:
            Allison Easton
            Reporter:
            Joan Bruguera Mic贸
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: