Priority: Major - P3
Resolution: Won't Fix
Affects Version/s: 4.1.5
Fix Version/s: None
Currently, the replica set nodes can learn about the higher term via heartbeart, oplog fetcher and cmds (like find & getmore). When the term is learnt via oplog fetcher, it calls ReplicationCoordinatorImpl::_processReplSetMetadata_inlock which updates the term only if the config version of the sync source is same as mine. We are missing that config version check in heartbeat, find and getmore before updating the term.
Also to be noted is that in ReplicationCoordinatorImpl::_handleHeartbeatResponse we update the term in 2 places
- ReplicationCoordinatorImpl::_processReplSetMetadata_inlock - Does the Config version check.
- Explicitly calling _updateTerm_inlock - Doesn't do the config version check. As a code cleanup, we should remove this as it just adds to the code redundancy.
Note : This bug was captured for this particular upgrade/downgrade sequence (pv1->pv0->pv1) where it lead to unnecessary stepdown.
1) Start a replica set in pv1.
2) Insert some document in pv1 (for term =1)
3)Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)
4) Upgrade to pv1 before the secondaries downgrade to pv0.
5) The current primary which is in term 0 receives heartbeat from the secondaries which think they are still in term 1(from step 1)
6) As a result, the current primary updates its term to 1 and steps down and starts a new election for term 2.