Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.4, 5.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.0, v4.4, v4.2, v4.0
Sprint:
Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
Linked BF Score:
29
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

processReplSetInitiate may trigger the checkpointer thread, but does not guarantee that we have either a stableTimestamp or a stable checkpoint. If our commit point (lastCommitted) does not advance fast enough before we call setStableTimestamp on the storage API, it's possible for us to enter a rollback attempt without a stableTimestamp.

This bug has likely existed since the RTT algorithm but became more prevalent after we introduced the automatic safe reconfig on stepup in 5.0, which will write a no-op oplog entry immediately after a node has been elected as primary.

This bug should be reproducible with the following steps:

Start a 3 node replica set with a failpoint to hang the checkpointer thread on one of the nodes (node0)
Node0 should become the original primary (might have to stop data replication on the other two nodes at this point so that Node0 cannot advance it's commit point)
Isolate Node0 and do a few writes on it
Elect a new primary between Node1 and Node2 and make sure there are committed writes. Make sure the terms are higher than Node0's, which will prevent Node0 from learning of the commit point from Node1 or Node2.
Reconnect Node0 to Node1 and Node2. Node0 should attempt to go into rollback and invariant because it does not have a stableTimestamp.

is related to

SERVER-78813 Commit point propagation fails indefinitely with exhaust cursors with null lastCommitted optime

Closed

SERVER-73975 There will be a freeze when executing the enableSharding or shardCollection command

Investigating

Assignee:: Vesselina Ratcheva (Inactive)
Reporter:: Pavithra Vetriselvan
Participants:: Githook User, Pavithra Vetriselvan, Vesselina Ratcheva, Vivian Ge
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jul 21 2021 12:18:15 PM UTC
Updated:: Oct 29 2023 09:50:37 PM UTC
Resolved:: Oct 04 2021 09:08:19 PM UTC
Confidence Status Last Update:: 31/Aug/21 7:36 AM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates