[SERVER-58721] processReplSetInitiate does not set a stableTimestamp or take a stable checkpoint Created: 21/Jul/21  Updated: 29/Oct/23  Resolved: 04/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Pavithra Vetriselvan Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-78813 Commit point propagation fails indefi... Closed
is related to SERVER-73975 There will be a freeze when executing... Investigating
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4, v4.2, v4.0
Sprint: Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
Participants:
Linked BF Score: 29

 Description   

processReplSetInitiate may trigger the checkpointer thread, but does not guarantee that we have either a stableTimestamp or a stable checkpoint. If our commit point (lastCommitted) does not advance fast enough before we call setStableTimestamp on the storage API, it's possible for us to enter a rollback attempt without a stableTimestamp.

This bug has likely existed since the RTT algorithm but became more prevalent after we introduced the automatic safe reconfig on stepup in 5.0, which will write a no-op oplog entry immediately after a node has been elected as primary.

This bug should be reproducible with the following steps:

  • Start a 3 node replica set with a failpoint to hang the checkpointer thread on one of the nodes (node0)
  • Node0 should become the original primary (might have to stop data replication on the other two nodes at this point so that Node0 cannot advance it's commit point)
  • Isolate Node0 and do a few writes on it
  • Elect a new primary between Node1 and Node2 and make sure there are committed writes. Make sure the terms are higher than Node0's, which will prevent Node0 from learning of the commit point from Node1 or Node2.
  • Reconnect Node0 to Node1 and Node2. Node0 should attempt to go into rollback and invariant because it does not have a stableTimestamp.


 Comments   
Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-58721 Set stable timestamp during replSetInitiate

(cherry picked from commit b19ac676cd1b00b3325d16b4845bd1d8ededfd32)
Branch: v5.0
https://github.com/mongodb/mongo/commit/5f04bbbc4aa7da24135b4548e4a298ce9d28b777

Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 04/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-58721 Set stable timestamp during replSetInitiate
Branch: master
https://github.com/mongodb/mongo/commit/b19ac676cd1b00b3325d16b4845bd1d8ededfd32

Generated at Thu Feb 08 05:45:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.