Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58294

setFeatureCompatibilityVersion can cause crash after stepping down

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      Repro test:
      0001-repro-bf-21787.patch

      ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding --log=file jstests/sharding/bf-21787
      
      Show
      Repro test: 0001-repro-bf-21787.patch ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding --log=file jstests/sharding/bf-21787
    • Repl 2021-08-09, Repl 2021-08-23
    • 110

      setFeatureCompatibilityVersion reads the in-memory FCV value and then calls FeatureCompatibilityVersion::updateFeatureCompatibilityVersionDocument, which will use that FCV value we read to look up the transitional FCV state. It will fassert that it is found.

      The problem is that we could have stepped down anywhere here, but we wouldn't realize yet because:
      a. When setFCV acquires the FCV lock in exclusive mode, it uses the non-interruptible variant of the ExclusiveLock constructor
      b. We could even have stepped down after having acquired the lock, but before reading the in-memory FCV value.

      Then, if the new primary runs setFCV and the former primary that is still running setFCV replicates the FCV change, the former primary could attempt to look up an invalid FCV transition and fassert.

      To address this, FeatureCompatibilityVersion::updateFeatureCompatibilityVersionDocument should check that the opCtx is not interrupted.

            Assignee:
            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: