Investigate changes in SERVER-109474: Disallow lastLTS to lastContinuous FCV upgrade on addShard containing user data

XMLWordPrintableJSON

    • Type: Investigation
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • Developer Tools

      Original Downstream Change Summary

      Starting in MongoDB v8.3, it will not be possible to convert an existing replica set to a sharded cluster (https://www.mongodb.com/docs/v8.0/tutorial/convert-replica-set-to-replicated-shard-cluster/) in the following scenario:

      • Both the binaries of the existing replica set and the config server of the new sharded cluster are latest (e.g. 8.3).
      • The FCV of the existing replica set is previous LTS (e.g. 8.0).
      • The FCV of the config server is the previous rapid release (e.g. 8.2).

      An IllegalOperation error is returned because the required FCV upgrade (e.g. FCV 8.0 to 8.2 on 8.3 binaries) is unsupported.

      The preferred way to convert a replica set to a sharded cluster is for all nodes to have the same binary and FCV version. This allows the conversion to be done without any FCV change.

      For more detail see SERVER-109474.

      Description of Linked Ticket

      In a cluster, all shards must have the same FCV, so the addShard command will upgrade/downgrade the FCV of the shard to be added to match the rest of the sharded cluster (incl. the FCV in the config server).

       

      When adding a new, empty shard to the cluster, it may happen that:

      • The binaries of the config server are latest (e.g. v8.3) but the FCV is lastContinuous (e.g. v8.2).
      • The binaries of the shard are also latest (e.g. v8.3) but the FCV is lastLTS (e.g. v8.0) - note this is the default when a shard server is created.

      We internally support an FCV upgrade from lastLTS to lastContinuous (e.g. from FCV v8.0 to FCV v8.2 on v8.3 binaries). Since the shard is empty, this always succeeds and is easily verified to be correct.

       

      However, addShard can also be used to convert/promote an existing, non-empty replica set to a sharded cluster. If the FCVs line up in the same way (e.g. the existing replica set has FCV v8.0, the sharded cluster's config server has FCV v8.2, and both are using v8.3 binaries), it will also trigger an upgrade from lastLTS to lastContinuous. This is not only unintended but problematic because:

      • This FCV upgrade path has no test coverage for the scenario where the replica set contains user data.
      • If the FCV upgrade fails, remediation is complicated since this FCV upgrade is not user facing (it can not be re-tried via setFCV, just via addShard).
      • Support abort a failed upgrade (SERVER-107829) is complex since it would require adding a "downgrading FCV from lastContinuous to lastLTS" path.

       

      We should instead make this unsupported promotion scenario fail with an error. If this happens it can be resolved by either:

      • Downgrading the shard to lastLTS binaries and re-creating the config server on lastLTS binaries. This is the safest option since the FCV will remain unchanged.
      • Upgrading the config server to latest FCV (e.g. v8.3) then doing completing the promotion. Note that this causes an FCV upgrade (from v8.0 to v8.3).

            Assignee:
            Unassigned
            Reporter:
            Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: