Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91837

FCV upgrade cleanup can race with other operations

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • 8.1 Required
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • ALL
    • v8.0

      During FCV upgrade, we:

      1. Transition to the upgradingToXY FCV (for example kUpgradingFrom_7_0_To_8_0) here.
      2. Take the global lock in S mode here, which is supposed to create a barrier between operations that see the old FCV (for example 7.0), and then operations that see the new FCV (8.0)
      3. Do upgrade cleanup work, which should happen in either  userCollectionsWorkForUpgrade, or upgradeServerMetadata
      4. Complete the upgrade by transitioning to the upgraded X.Y FCV (for example 8.0)

      However, feature flag checks on the upgradingToXY FCV still behave as if the FCV were the old (downgraded) FCV. For example, for a feature flag that is enabled on 8.0, when we check if it's enabled on the kUpgradingFrom_7_0_To_8_0 FCV, it will return false, that it is not enabled. 

      As a result, during steps 3 -4, there can be a race, as operations that take the global lock after step 2 will see the upgradingToXY FCV, but feature flags enabled on X.Y will still not be enabled. For example, in this check in step 3, we want to do disallow a certain type of collection that existed on 7.0, and we expect that new operations should only see the new FCV, but since feature flag checks with the kUpgradingFrom_7_0_To_8_0 FCV effectively still behave as if it were the old FCV, new operations will see that the feature flag is still disabled and will still create the 7.0 version of the collection. 

       

      Note that this is not an issue on FCV downgrade, because feature flag checks on downgradingToXY FCV behave as if the FCV were the new FCV (if a feature flag is enabled in 8.0, and we check if it's enabled in the kDowngradingFrom_8_0_To_7_0 FCV, it will return false), so the global lock does work as a barrier between the old FCV and the new FCV. 

       

            Assignee:
            Unassigned Unassigned
            Reporter:
            huayu.ouyang@mongodb.com Huayu Ouyang
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: