Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58721

processReplSetInitiate does not set a stableTimestamp or take a stable checkpoint

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.4, 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.0, v4.4, v4.2, v4.0
    • Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
    • 29

      processReplSetInitiate may trigger the checkpointer thread, but does not guarantee that we have either a stableTimestamp or a stable checkpoint. If our commit point (lastCommitted) does not advance fast enough before we call setStableTimestamp on the storage API, it's possible for us to enter a rollback attempt without a stableTimestamp.

      This bug has likely existed since the RTT algorithm but became more prevalent after we introduced the automatic safe reconfig on stepup in 5.0, which will write a no-op oplog entry immediately after a node has been elected as primary.

      This bug should be reproducible with the following steps:

      • Start a 3 node replica set with a failpoint to hang the checkpointer thread on one of the nodes (node0)
      • Node0 should become the original primary (might have to stop data replication on the other two nodes at this point so that Node0 cannot advance it's commit point)
      • Isolate Node0 and do a few writes on it
      • Elect a new primary between Node1 and Node2 and make sure there are committed writes. Make sure the terms are higher than Node0's, which will prevent Node0 from learning of the commit point from Node1 or Node2.
      • Reconnect Node0 to Node1 and Node2. Node0 should attempt to go into rollback and invariant because it does not have a stableTimestamp.

            Assignee:
            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            Reporter:
            pavithra.vetriselvan@mongodb.com Pavithra Vetriselvan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: