Authoritative metadata cloning DDL may run on fully upgraded shards

XMLWordPrintableJSON

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2026-06-22
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      As part of Authoritative Shards, setFCV clones the authoritative DB/collection metadata from the config server to the shards. This is done by having the config server spawn the cloning DDL coordinator on the shards during its kUpgrading transitional FCV.

       

      The expectation is that the recipient shards are also in kUpgrading. This happens during the "happy path" however there are some edge cases where this does not happen.

       

      Edge case 1: Retrying a setFCV that got interrupted after shads got sent to FCV 9.0 (UPGRADED).

      1. All nodes are on FCV 8.0, user starts a FCV upgrade to 9.0.
      2. During the kPrepare phase (all configsvr+shardsvrs on kUpgrading FCV), we clone the authoritative metadata from the configsvr to the shards.
      3. The config server enters the kComplete phase and sends the shards to FCV 9.0 (UPGRADED).
      4. However right before the config server goes to FCV 9.0 (UPGRADED), it steps down.
      5. The setFCV upgrade to 9.0 is retried. This re-executes all the (kStart, kPrepare, kComplete) phases.
      6. During the re-execution of kPrepare phase, we re-send the clone authoritative metadata to the shards, despite the shards already being in FCV 9.0 (UPGRADED).

      This edge case can not happen with Symmetric FCV (since SERVER-119476, in steps 5-6 we resume the upgrade from the kComplete phase without re-executing kPrepare).

       

      Edge case 2: Config-server only "downgrading to upgrading" FCV transition

      • All nodes are on FCV 9.0, setFCV starts a FCV downgrade to 8.0.
      • During the kStart phase, the Config Server sets its FCV to kDowngrading and then fails right after, before it could send any of the shards to kDowngrading.
      • The user then decides that he wants to re-upgrade to FCV 9.0.
      • The config server will then set its FCV to kUpgrading and re-execute all the (kStart, kPrepare, kComplete) phases. The shards will do nothing since they are already on FCV 9.0 (UPGRADED).
      • During the kPrepare phase, we re-send the clone authoritative metadata to the shards, despite the shards already being in FCV 9.0 (UPGRADED).

      This edge case can still happen with Symmetric FCV.

       

      Fix: By idempotency, shards should do nothing if they receive a cloning DDL request when already on FCV 9.0, since they are already authoritative.

            Assignee:
            Joan Bruguera Micó
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: