Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46897

REMOVED node may never send heartbeat to fetch newest config

    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0, v3.6
    • Hide
      var replTest = new ReplSetTest({nodes: 2});
      const nodes = replTest.startSet();
      replTest.initiateWithHighElectionTimeout();
      
      var primary = replTest.getPrimary();
      var secondary = replTest.getSecondary();
      let config = replTest.getReplSetConfigFromNode();
      let origConfig = Object.assign({}, config);
      
      jsTestLog("Starting reconfigs.");
      
      // Reconfig from {n0,n1} -> {n0}.
      // n1 will now be REMOVED.
      config.version++;
      config.members = [origConfig.members[0]];
      jsTestLog("Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) +
                " with version: " + config.version);
      assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000}));
      
      // Wait for the config to propagate to n1 so it enters REMOVED.
      sleep(4000);
      
      // No-op reconfig from {n0} -> {n0}.
      // n1 is still REMOVED.
      config.version++;
      jsTestLog("Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) +
                " with version: " + config.version);
      assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000}));
      
      // Reconfig from {n0} -> {n0, n1}.
      // n1 was previously REMOVED, but will now be added back in. It should be able to get the new config
      // eventually.
      config.version++;
      config.members = [origConfig.members[0], origConfig.members[1]];
      jsTestLog("Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) +
                " with version: " + config.version);
      assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000}));
      
      replTest.awaitNodesAgreeOnConfigVersion();
      replTest.stopSet();
      Show
      var replTest = new ReplSetTest({nodes: 2}); const nodes = replTest.startSet(); replTest.initiateWithHighElectionTimeout(); var primary = replTest.getPrimary(); var secondary = replTest.getSecondary(); let config = replTest.getReplSetConfigFromNode(); let origConfig = Object .assign({}, config); jsTestLog( "Starting reconfigs." ); // Reconfig from {n0,n1} -> {n0}. // n1 will now be REMOVED. config.version++; config.members = [origConfig.members[0]]; jsTestLog( "Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) + " with version: " + config.version); assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000})); // Wait for the config to propagate to n1 so it enters REMOVED. sleep(4000); // No-op reconfig from {n0} -> {n0}. // n1 is still REMOVED. config.version++; jsTestLog( "Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) + " with version: " + config.version); assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000})); // Reconfig from {n0} -> {n0, n1}. // n1 was previously REMOVED, but will now be added back in . It should be able to get the new config // eventually. config.version++; config.members = [origConfig.members[0], origConfig.members[1]]; jsTestLog( "Reconfiguring to members: " + tojsononeline(config.members.map(m => m._id)) + " with version: " + config.version); assert.commandWorked(primary.adminCommand({replSetReconfig: config, maxTimeMS: 5000})); replTest.awaitNodesAgreeOnConfigVersion(); replTest.stopSet();
    • Repl 2020-04-06
    • 36

      When a replica set node installs a config that it is not a member of, it enters the REMOVED state. While in REMOVED state, it keeps track of any other node that sends it a heartbeat request in a seed list. If a node n1 is currently REMOVED and receives a heartbeat from node n0, it will add n0 to its seed list. If n1 then learns of a newer config that it is still not a member of, it will install this config and cancel its outgoing heartbeats. It will not reschedule any heartbeats, though, since it is still REMOVED in its current config. It will also not clear its seed list, since that only happens when heartbeats are restarted. So, this means that the node is currently REMOVED, and its seed list contains node n0, and it is not heartbeating any other node. If n0 then executes a reconfig that adds n1 back into the set, n0 will never learn of it because it only schedules a heartbeat to fetch a config if its seed list set changes. It will remain on a stale config indefinitely. To fix this issue, we may want to clear the seed list any time a node installs a new config that it is not a member of.

            Assignee:
            william.schultz@mongodb.com Will Schultz
            Reporter:
            william.schultz@mongodb.com Will Schultz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: