Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45906

Initial stable checkpoint not triggered properly when enableMajorityReadConcern=false

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.4, 4.3.4
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2
    • Steps To Reproduce:
      Hide

      let rst = new ReplSetTest({
          nodes: 1,
          nodeOptions: {
              enableMajorityReadConcern: "false",
              // Prevent frequent checkpoints by background thread.
              syncdelay: 60,
              setParameter: {logComponentVerbosity: tojson({storage: 2})}
          }
      });
      rst.startSet();
      rst.initiate();
       
      let primary = rst.getPrimary();
      let coll = primary.getDB("test")["test"];
      assert.commandWorked(coll.insert({x: 1}));
       
      jsTestLog("Kill and restart the node.");
      rst.stop(0, 9, {allowedExitCode: MongoRunner.EXIT_SIGKILL}, {forRestart: true});
      rst.start(0, undefined, true /* restart */);
       
      jsTestLog("Wait for primary.");
      let timeout = 20 * 1000;
      primary = rst.getPrimary(timeout);
       
      rst.stopSet();
      

      Show
      let rst = new ReplSetTest({ nodes: 1, nodeOptions: { enableMajorityReadConcern: "false" , // Prevent frequent checkpoints by background thread. syncdelay: 60, setParameter: {logComponentVerbosity: tojson({storage: 2})} } }); rst.startSet(); rst.initiate();   let primary = rst.getPrimary(); let coll = primary.getDB( "test" )[ "test" ]; assert.commandWorked(coll.insert({x: 1}));   jsTestLog( "Kill and restart the node." ); rst.stop(0, 9, {allowedExitCode: MongoRunner.EXIT_SIGKILL}, {forRestart: true }); rst.start(0, undefined, true /* restart */ );   jsTestLog( "Wait for primary." ); let timeout = 20 * 1000; primary = rst.getPrimary(timeout);   rst.stopSet();
    • Linked BF Score:
      30

      Description

      When enableMajorityReadConcern is false, we set the stable timestamp to the lastApplied timestamp, instead of requiring that it is behind the majority commit point. This means that on replica set initiation we will set the stable timestamp to the lastApplied optime when we first write down an "initiating set" oplog entry. After this, we then set our initialDataTimestamp to the same lastApplied value. Since we set our stableTimestamp=lastApplied before setting initialDataTimestamp=lastApplied, the condition here for triggering an initial checkpoint will never become true. The condition checks to see if we advanced the stable timestamp from a timestamp less than the IDT to a value greater than the IDT.

      Since we never trigger an initial checkpoint, the first checkpoint will be taken after the normal syncdelay frequency, which is 1 minute by default. If we shut down uncleanly after creating initial replication collections like local.system.replset but before our first checkpoint, we may lose this data causing a node to think it is no longer a part of a set it initiated. We should make sure an initial checkpoint is triggered when we first set a stable timestamp with EMRC=false.

        Attachments

          Activity

            People

            Assignee:
            william.schultz William Schultz (Inactive)
            Reporter:
            william.schultz William Schultz (Inactive)
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: