Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47636

Force reconfig running concurrently with step up can cause reconfig in drain mode to fail

    • Fully Compatible
    • ALL
    • v4.4
    • Hide

      Applying this diff (force_reconfig_drain_mode_repro.diff) on this commit and running the following commands should reproduce the bug:

      ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test
      build/ninja/mongo/db/repl/db_repl_coordinator_test  --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
      
      Show
      Applying this diff ( force_reconfig_drain_mode_repro.diff ) on this commit and running the following commands should reproduce the bug: ninja -j400 build/ninja/mongo/db/repl/db_repl_coordinator_test build/ninja/mongo/db/repl/db_repl_coordinator_test --suite ReplCoordTest --filter NodeReturnsNotMasterWhenRunningForceReconfigWhileInDrainMode
    • Repl 2020-05-04
    • 42

      After a node has been elected primary and drained the ops from its buffer, it will check if it needs to run a reconfig to increment its config term. It does this under the replication coordinator mutex, but then releases the lock before running the reconfig. If a force reconfig is running concurrently it may install a new config with term -1 after we do this check and release our lock but before we run the reconfig. If this happens, we will then try to run a reconfig where we set the config version to the version installed by the force reconfig, and the config term to the node's current term. If the force reconfig installed version 'version' and the node's current term is 'term', then we will run a reconfig to (version, term), while our current config is (version, -1). Since we ignore terms for config comparison if either term is -1, this will not pass the validation check that the new config has a newer version and term than the current config. We will return this error and then fassert.

      To address this, we may want to consider preventing force reconfigs from running concurrently with a node while in drain mode. For non force reconfigs, we should already prevent this since we check canAcceptNonLocalWrites, but we bypass these checks for force reconfigs, since they can run on a secondary.

            Assignee:
            william.schultz@mongodb.com Will Schultz
            Reporter:
            william.schultz@mongodb.com Will Schultz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: