[SERVER-45753] Skip waiting for OpTime with stale term during stepDown Created: 24/Jan/20  Updated: 29/Oct/23  Resolved: 28/Jan/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-02-10
Participants:
Linked BF Score: 50

 Description   

SERVER-43949 added an invariant that awaitReplication only waits for an OpTime in the term that the primary node is currently writing to. This is a safe assumption in waitForWriteConcern codepath because we add an OpTime to the _replicationWaiterList only if the term of the OpTime is the same as the current term. And on stepDown, we clear the _replicationWaiterList.

However, in stepDown, we blindly add an OpTime to the _replicationWaiterList. So if a stepDown starts in the middle of primary catch-up mode, the lastAppliedOpTime will have a previous term. But once the primary catch-up mode finishes, the primary will write an noop with its new term. And then when we want to wake up waiters in the _replicationWaiterList, we hit this invariant.

In fact, if the lastAppliedOpTime we get during stepDown has a stale term compared to the current term, it means that the current primary has not yet written any oplog entry or this is an unconditional stepDown on hearing a higher term. So it is more correct to simply skip waiting for majority. And it is not the current primary's responsibility to wait for something written in the previous term to become majority committed before it steps down. Indeed, the previous primary should have done so already (assuming the previous primary was non-force stepped down).



 Comments   
Comment by Githook User [ 28/Jan/20 ]

Author:

{'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}

Message: SERVER-45753: Skip waiting for OpTime with stale term during stepDown
Branch: master
https://github.com/mongodb/mongo/commit/3ca511007e865578b8c81ea82f9cfc618f0dc91c

Generated at Thu Feb 08 05:09:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.