Avoid asserting "phase was already done" if starting a new FCV transition

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: 9.0.0-rc0
    • Component/s: Upgrade/Downgrade
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2026-05-11
    • 0
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      Since SERVER-119476, shards reject a setFCV request from the config server if they detect that the same FCV transition is ongoing and it's in a phase further ahead than the one requested by the config server. This is a guardrail against network replays that could cause setFCV to "go back in time".

       

      However there's an edge case where it can trigger by chaining a "downgrading to upgrading" and a "upgrading to downgrading" transitions as follows:

      1. Initial state: The configsvr and the shardsvr are both in FCV 9.0.
      2. First attempt to downgrade to FCV 8.0: We request a downgrade to FCV 8.0. Both the configsvr and the shardsvr set the FCV to "downgrading" and advance until the "prepare" phase, then setFCV fails due to incompatible metadata.
      3. Failed attempt to return to FCV 9.0: We request to set the FCV back to 9.0 ("downgrading to upgrading" transition). The configsvr sets the FCV to "upgrading" and then immediately steps down in the "start" phase. The shard was not contacted, so it's still in the "prepare" phase of "downgrading".
      4. Second attempt to downgrade to FCV 8.0: We request another downgrade to FCV 8.0; this is allowed by the "upgrading to downgrading" transition. The configsvr sets the FCV to "downgrading" then requests to the shard to run the "start" phase of "downgrading".

       

      At this point the shard incorrectly rejects the request (because the shard is in "prepare" of "downgrading", and gets a request for the "start" of "downgrading"): However, it's not taking into account that the first and the second "downgrading" requests are different: The second one it's a new "incarnation" with a higher changeTimestamp.

       

      The assertion should not trigger in this case since it's a request to make further progress.

            Assignee:
            Joan Bruguera Micó
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: