-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 8.3.0-rc0
-
Component/s: Sharding, Upgrade/Downgrade
-
None
-
Catalog and Routing
-
Minor Change
-
CAR Team 2025-09-15, CAR Team 2025-09-29
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
In a sharded cluster, all shards must have the same FCV, so the addShard command will upgrade/downgrade the FCV of the shard to be added to match the rest of the sharded cluster (incl. the config server).
When adding a new, empty shard to the cluster, it may happen that:
- The binaries of the config server are latest (e.g. v8.3) but the FCV is lastContinuous (e.g. v8.2).
- The binaries of the new, empty shard are also latest (e.g. v8.3) but the FCV is lastLTS (e.g. v8.0) - note this is the default when a shard server is created.
If this happens we upgrade the FCV upgrade using an internal path from lastLTS to lastContinuous (e.g. from FCV v8.0 to FCV v8.2 on v8.3 binaries). Since the shard is brand new, this always succeeds and is easily verified to be correct.
However, addShard can also be used to convert/promote an existing, non-empty replica set to a sharded cluster. If the versions line up in the same way (e.g. the existing replica set has FCV v8.0, the sharded cluster's config server has FCV v8.2, and both are using v8.3 binaries), it will also trigger an upgrade from lastLTS to lastContinuous. This is not only unintended but problematic because:
- This FCV upgrade path has no test coverage for the scenario where the replica set contains user data.
- If the FCV upgrade fails, remediation is complicated since this FCV upgrade is not user facing (it can not be re-tried via setFCV, just via addShard).
- Supporting abort a failed upgrade due to removed features (
SERVER-107829) is complex since it would require a new "downgrading FCV from lastContinuous to lastLTS" path.
We should instead make this unsupported promotion scenario fail with an error. If this happens it can be resolved by either:
- Downgrading the shard to lastLTS binaries and re-creating the config server on lastLTS binaries. The FCV remains unchanged after promotion.
- Re-creating the config server on latest binaries but lastLTS FCV. The FCV remains unchanged after promotion.
- Downgrading the config server binaries to lastContinuous (e.g. v8.2) then doing the promotion. Note that this causes an FCV upgrade (from v8.0 to v8.2).
- Upgrading the config server to latest FCV (e.g. v8.3) then doing the promotion. Note that this causes an FCV upgrade (from v8.0 to v8.3).
- is depended on by
-
SERVER-105827 Enable feature flag
-
- Closed
-
-
COMPASS-9865 Investigate changes in SERVER-109474: Disallow lastLTS to lastContinuous FCV upgrade on addShard containing user data
-
- Needs Triage
-
-
TOOLS-3979 Investigate changes in SERVER-109474: Disallow lastLTS to lastContinuous FCV upgrade on addShard containing user data
-
- Closed
-
- is related to
-
SERVER-107829 Allow FCV transition from isUpgrading to Downgraded
-
- Closed
-
- related to
-
SERVER-109583 Don't implicitly upgrade the FCV during promotion from replica set to sharded cluster via addShard
-
- Backlog
-