-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Replication
-
ALL
-
Repl 2025-03-31
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Let's say we're a primary that has won election in term 2, but stuck in drain mode for a long time, and we have a target opTime for primary catchup at TS(1), term: 1. Meanwhile, another node becomes primary and applies at opTime TS(3), term: 3. It seems possible that we could somehow replicate that opTime during catchup mode, and end up with a lastApplied that is at a higher term than what we were elected for. This is because we simply run catchup (and thus the oplog fetcher and oplog applier) until we reach our target opTime, at which point we abort catchup with a success.
This may cause an issue later when we attempt to transition to writeable primary and write our first no-op oplog entry. The timestamp would be the most recent, but our term is outdated compared to our lastApplied. As a result, we will trigger this invariant. Since we compare terms first, we will see that our no-op optime is lower than our lastApplied. However, the timestamp is more recent for the no-op optime, which will trigger the invariant failure.
In our code, we do already have a mechanism to abort catchup if we see a higher term. But this only aborts if we are still actively catching up, and we only call that function some places including state transitions and heartbeats.
To resolve this, I think we should do the check for a higher term before we declare the catchup state successful, and abort becoming primary if we discover we applied a higher term during this process.