If a (syncing) node has a null lastCommitted optime when it issues its exhaust getMore request to its sync source, it will not receive empty oplog branches for commit point propagation forever. This is because we don't take into account or update the last known lastCommitted optime of an exhaust cursor if it's originally null (see here and here).
Normally, a syncing node would get its sync source's lastCommitted optime from the first oplog find request. But if the sync source node also had a null lastCommitted optime when the syncing node started to sync from it, then I think we could end up in the situation I mentioned above. The elected primary normally relies on the JournalFlusher to trigger the first lastCommitted calculation/update. And this could be delayed because the JournalFlusher is run asynchronously. This could happen after replSetInitiate on 4.4 (see SERVER-58721) or after node restart on all versions.
- causes
-
SERVER-79885 Oplog fetching getMore should not set null lastKnownCommittedOpTime if it is not using exhaust cursors
- Closed
- related to
-
SERVER-58721 processReplSetInitiate does not set a stableTimestamp or take a stable checkpoint
- Closed
-
SERVER-68514 Delay announcement of new primary until first oplog entry in term is majority committed
- Backlog
-
SERVER-53813 Avoid serving stale majority reads on new primary after election
- Closed
- mentioned in
-
Page Loading...