[SERVER-38504] On Primary, to verify that our lastAppliedOpTime is lagged behind the TopologyCoordinator::_lastCommittedOpTime, we should compare both its term and timestamp for pv1. Created: 10/Dec/18  Updated: 06/Dec/22  Resolved: 30/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-35608 Invariant that term from lastAppliedO... Closed
is related to DOCS-12253 Add a comment in "Modify Replica Set ... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

Consider the below upgrade->downgrade->upgrade (pv1->pv0->pv1 ) sequence.

1) Start a replica set in pv1.

2) Insert some document in pv1 (for term =1)

3)Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)

4) Upgrade to pv1 before the secondaries downgrade to pv0 and their lastDurableOptime are still in term 1.

5) The current primary when it attempts to write in its current term (i.e.) term 0,  it does below 2 things

       - Moves forward its lastAppliedOpTime.

       - Sets the new lastAppliedOpTime to its stableOpTime only if the new lastAppliedOpTime <= lastCommitedOpTime.

Currently if both the lastAppliedOpTime and lastCommitedOpTime are from pv1, we first compare its term and if they are equal, then we compare its timestamp. In the above the sequence, say our lastCommitedOpTime is  (ts1, t:1) and lastAppliedOpTime is (ts2, t:0) where ts1 < ts2. This means we would satisfy the above condition lastAppliedOpTime <= lastCommitedOpTime that results in updating our stableTimestamp. As a result, it leads to invariant failure in line 3398.



 Comments   
Comment by Tess Avitabile (Inactive) [ 30/May/19 ]

We have not seen this in the field. Our documentation advises users to ensure that an oplog entry from the new protocol version has replicated to all nodes before doing a second change in protocol version.

Generated at Thu Feb 08 04:49:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.