Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59409

Race between reconfig replication and stepup can cause RSM to be stuck in reporting ReplicaSetNoPrimary

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.3.0, 4.4.11, 5.0.4, 5.1.0-rc1
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Major Change
    • ALL
    • v5.1, v5.0, v4.4
    • Repl 2021-09-06, Repl 2021-09-20
    • 152

      Currently, replica set nodes will report it's configVersion as the setVersion to the RSM for topology management purposes. The TopologyManager tracks the max setVersion it has seen so far, and for any primaries that report a configVersion < maxSetVersion, the RSM will set the status of the node as UNKNOWN because it thinks it's a stale primary.

      There is an existing race condition where if a user performs a reconfig that bumps the configVersion from V to V + 1 and a new primary is stepped up before it has applied configVersion V + 1, which will cause the replica set to enter a state where the RSM is unable to detect a primary because it thinks the new primary that still reports config version V is stale.

      Consider the following:

      1. Perform reconfig that bumps (configVersion, term) from (V, T) to (V + 1, T). Secondaries are still on (V, T). RSM sets maxSetVersion to V + 1.
      2. New primary steps up, sets it's own configVersion to (V, T + 1). 
      3. Old primary will recognize (V, T + 1) as a newer config than (V + 1, T) since term is given priority when ordering ConfigVersionAndTerm. So the old primary will fetch the newer config (V, T +1) to replace its own config.
      4. RSM will not recognize the new primary as "up-to-date" since it is still reporting setVersion V when maxSetVersion is set to V + 1.
      5. RSM will set the new primary status to UNKNOWN, and report a topology of ReplicaSetNoPrimary. 
        The RSM and replica set stay out of sync with no way to recover without manual intervention.

      Since we will report that the reconfig failed when we fail to replicate the config to the rest of the nodes. This could prompt users to reissue their reconfig on the new primary. However, this can cause failures in our jstests (our stepdown suites in particular). And also, it sounds like it could be problematic that the RSM becomes out of sync with the actual state with the replica set (and is unable to recover) as there are other components that rely on the RSM. 

            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            jason.chan@mongodb.com Jason Chan
            0 Vote for this issue
            11 Start watching this issue