[SERVER-38505] For pv1, to determine if the oplog entries are applied out of order, we should compare both the term and timestamp of firstOpTimeInBatch and lastAppliedOpTimeAtStartOfBatch Created: 10/Dec/18  Updated: 06/Dec/22  Resolved: 30/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-35608 Invariant that term from lastAppliedO... Closed
is related to DOCS-12253 Add a comment in "Modify Replica Set ... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

Consider the below upgrade->downgrade->upgrade (pv1->pv0->pv1 ) sequence.

1) Start a replica set in pv1.

2) Insert some document in pv1 (for term =1)

3) Downgrade to pv0 while the secondaries are still replicating the documents from previous pv1 (term =1)

4) Upgrade to pv1 before the secondaries downgrade to pv0.

5) Secondaries learns the new term (term 0) from the heartbeat received from primary while their lastAppliedOpTimes are still in term 1.

6) Lets say, on secondaries, the node's lastAppliedOpTime & lastFetchedOpTime is (100, t:1). And, when they try to replicate the oplog entries from primary, it adds a filter in the find command to fetch only the oplog entries  having timestamp greater than or equal to  our lastFetchedOpTime's timestamp "100". When secondaries receive a batch combining oplog entries from step 2(pv1), step3 (pv0) and step4(pv1) (say (100, t:1)| (101, t:1) || (102, t:-1) || (103, t:0)), we apply those entries and try to move forward our lastAppliedOpTime to the last entry in the batch (103,t:0). But, unfortunately, we can't move forward our lastAppliedOpTime as (103,t:0) < (100, t:1).

7) Assume, that secondary receives next batch starting with (104, t:0). Before applying the batch, we verify that the oplog entries are not applied out of order by checking that first entry's optime in the batch  is lesser or equal to the lastAppliedOptime.  Since (104, t:0) is less than our lastAppliedOpTime (100, t:1), it leads to fassert failure.

 Here we see 2 problems

    1) Step6 where the lastAppliedOpTime is not moving forward because a batch has oplog entries (from previous pv1, pv0, pv1).

    2) Step7 where we get fassert failure stating that oplog entries are applied out of order.

Problem 2 won't occur as we would invariant during step 6 while trying to move forward our lastAppliedOptime (see SERVER-35608).



 Comments   
Comment by Tess Avitabile (Inactive) [ 30/May/19 ]

We have not seen this in the field. Our documentation advises users to ensure that an oplog entry from the new protocol version has replicated to all nodes before doing a second change in protocol version.

Generated at Thu Feb 08 04:49:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.