Disallow lastLTS to lastContinuous FCV upgrade on addShard containing user data

XMLWordPrintableJSON

    • Catalog and Routing
    • Minor Change
    • CAR Team 2025-09-15, CAR Team 2025-09-29
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      In a sharded cluster, all shards must have the same FCV, so the addShard command will upgrade/downgrade the FCV of the shard to be added to match the rest of the sharded cluster (incl. the config server).

       

      When adding a new, empty shard to the cluster, it may happen that:

      • The binaries of the config server are latest (e.g. v8.3) but the FCV is lastContinuous (e.g. v8.2).
      • The binaries of the new, empty shard are also latest (e.g. v8.3) but the FCV is lastLTS (e.g. v8.0) - note this is the default when a shard server is created.

      If this happens we upgrade the FCV upgrade using an internal path from lastLTS to lastContinuous (e.g. from FCV v8.0 to FCV v8.2 on v8.3 binaries). Since the shard is brand new, this always succeeds and is easily verified to be correct.

       

      However, addShard can also be used to convert/promote an existing, non-empty replica set to a sharded cluster. If the versions line up in the same way (e.g. the existing replica set has FCV v8.0, the sharded cluster's config server has FCV v8.2, and both are using v8.3 binaries), it will also trigger an upgrade from lastLTS to lastContinuous. This is not only unintended but problematic because:

      • This FCV upgrade path has no test coverage for the scenario where the replica set contains user data.
      • If the FCV upgrade fails, remediation is complicated since this FCV upgrade is not user facing (it can not be re-tried via setFCV, just via addShard).
      • Supporting abort a failed upgrade due to removed features (SERVER-107829) is complex since it would require a new "downgrading FCV from lastContinuous to lastLTS" path.

       

      We should instead make this unsupported promotion scenario fail with an error. If this happens it can be resolved by either:

      • Downgrading the shard to lastLTS binaries and re-creating the config server on lastLTS binaries. The FCV remains unchanged after promotion.
      • Re-creating the config server on latest binaries but lastLTS FCV. The FCV remains unchanged after promotion.
      • Downgrading the config server binaries to lastContinuous (e.g. v8.2) then doing the promotion. Note that this causes an FCV upgrade (from v8.0 to v8.2).
      • Upgrading the config server to latest FCV (e.g. v8.3) then doing the promotion. Note that this causes an FCV upgrade (from v8.0 to v8.3).

            Assignee:
            Joan Bruguera Micó
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: