Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47331

Rethink the transition from force reconfig to safe reconfig

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.7.0
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Fully Compatible
    • Repl 2020-04-20

      When the current config C0 is installed by a "force" reconfig, the next non-force reconfig with config C1 doesn't prevent config divergence if
      1. Reconfig C1 has not propagated to a majority of nodes.
      2. A failover happens
      3. A new reconfig with a different config C2 runs on the new primary.
      4. C1 and C2 propagate to disjoint nodes.

      The diverged configs may lead to two primaries elected in the same term until C2 (with a higher config term) propagates to a majority of C1. A similar issue is shown in SERVER-47119 with a detailed trace.

      In Initial Sync Semantics project, we will give new nodes votes: 0 and run automatic reconfig afterwards to grant them votes afterwards. The config to add the node will face the unsafe but rare case mentioned above. Once the first reconfig passes the aforementioned unsafe period and becomes committed, the following automatic reconfigs will be safe.

      To avoid the unsafe case, one idea is to run an automatic reconfig after a force reconfig by increasing the config version and giving it a config term. After this automatic reconfig, following reconfigs will be safe. However, when users run "force" reconfig, it's likely the replset is not stable so that they are willing to risk the loss of committed data. It may not be the right time to run such an automatic reconfig.

      Even worse, the automatic reconfig may interrupt the propagation of the "force" reconfig. For example, assuming the current config C0 has 5 nodes, a force reconfig C1 runs on a secondary to convert that secondary to a single node replica set. The force reconfig C1 will increase the version but remove the config term, then propagate to other nodes on their next heartbeats. Nodes in C0 will become REMOVED after learning C1. However, if an automatic reconfig C2 happens on the single node replset, since C2 has a term, C2's term has to be higher than C0 to propagate, which may not be the case if another election occurs in C0. As a result, C2 may not be able to propagate to nodes still in C0. If their terms are the same, nodes in C0 will have a diverged config. They'll be alive and keep running heartbeats to the single node replset. When either of C0 or C2 has a higher term, it will be propagated to the other, potentially overriding the force reconfig.

            siyuan.zhou@mongodb.com Siyuan Zhou
            siyuan.zhou@mongodb.com Siyuan Zhou
            0 Vote for this issue
            5 Start watching this issue