Configs should be rejected if they make it impossible to elect a Primary

XMLWordPrintableJSON

    • Replication
    • Repl 2025-06-23, Repl 2025-07-07
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • 0

      Attached is a jstest demonstrating a config that causes the replica set to be unable to elect a primary without external intervention.

      Our config checking logic should reject this config, and possibly others that cause this same scenario (no primary can be elected without changing the config again).

       

      //
      // N0(p), N1(s), N2(s)
      // N1 and N2 have priority 0, cannot be primary
      // N0 gets restarted, and after oplog truncation, is behind N1 and N2
      //
      // Neither secondary can step up, and N0 cannot sync its missing oplog entries
      // with chainingAllowed: false, so it cannot be primary either.
      //
      
      import {ReplSetTest} from "jstests/libs/replsettest.js"
      import {PrepareHelpers} from "jstests/core/txns/libs/prepare_helpers.js";
      
      let name = "no_stepup_when_primary_behind";
      let rst = new ReplSetTest({
        name: name,
        nodes: {
          n0: {},
          n1: {},
          n2: {},
        },
      });
      rst.startSet();
      
      const conf = rst.getReplSetConfig();
      conf.members[1].priority = 0;
      conf.members[2].priority = 0;
      conf.settings = {
        heartbeatIntervalMillis: 500,
        electionTimeoutMillis: 5000,
        chainingAllowed: false,
      };
      
      rst.initiate(conf);
      rst.awaitNodesAgreeOnPrimary();
      
      const primary = rst.getPrimary();
      jsTestLog(`PRIMARY=${primary.name}`);
      primary.adminCommand({
        configureFailPoint: "pauseJournalFlusherThread",
        mode: "alwaysOn"}
      );
      primary.adminCommand({
        configureFailPoint: "pauseJournalFlusherBeforeFlush",
        mode: "alwaysOn"}
      );
      primary.getDB("test").test.insert({key0: "value0"});
      primary.getDB("test").test.insert({key1: "value1"});
      
      const primaryId = rst.getNodeId(primary);
      rst.stop(primaryId, MongoRunner.EXIT_SIGKILL, {forRestart: true, allowedExitCode: MongoRunner.EXIT_SIGKILL});
      rst.start(primaryId, {waitForConnect:true}, true);
      jsTestLog("kill+restart complete");
      
      // this is what shouldn't happen after we improve how mongod handles this configuration
      rst.awaitNoPrimary();
      rst.stopSet();

            Assignee:
            Joseph Obaraye
            Reporter:
            Myles Hathcock
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: