Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- repl-shortlist

Assigned Teams:

Replication
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.2, v8.1, v8.0, v7.0
Sprint:
Repl 2025-03-31, Repl 2025-06-09, Repl 2025-06-23, Repl 2025-07-07, Repl 2025-07-21, Repl 2025-08-04
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Let's say we're a primary that has won election in term 2, but stuck in drain mode for a long time, and we have a target opTime for primary catchup at TS(1), term: 1. Meanwhile, another node becomes primary and applies at opTime TS(3), term: 3. It seems possible that we could somehow replicate that opTime during catchup mode, and end up with a lastApplied that is at a higher term than what we were elected for. This is because we simply run catchup (and thus the oplog fetcher and oplog applier) until we reach our target opTime, at which point we abort catchup with a success.

This may cause an issue later when we attempt to transition to writeable primary and write our first no-op oplog entry. The timestamp would be the most recent, but our term is outdated compared to our lastApplied. As a result, we will trigger this invariant. Since we compare terms first, we will see that our no-op optime is lower than our lastApplied. However, the timestamp is more recent for the no-op optime, which will trigger the invariant failure.

In our code, we do already have a mechanism to abort catchup if we see a higher term. But this only aborts if we are still actively catching up, and we only call that function some places including state transitions and heartbeats.

To resolve this, I think we should do the check for a higher term before we declare the catchup state successful, and abort becoming primary if we discover we applied a higher term during this process.

Assignee:: Joseph Obaraye
Reporter:: Xuerui Fa
Participants:: Githook User, Joseph Obaraye, Xuerui Fa
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Mar 07 2025 09:41:47 PM UTC
Updated:: Aug 06 2025 06:48:08 PM UTC
Resolved:: Aug 06 2025 03:48:05 PM UTC
Confidence Status Last Update:: 18/Jun/25 7:54 PM

Details

Description

Attachments

Forms

Activity

People

Dates