[SERVER-35608] Invariant that term from lastAppliedOptime is never greater than our current term Created: 14/Jun/18  Updated: 29/Oct/23  Resolved: 11/Dec/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.19, 3.6.11, 4.0.6, 4.1.7

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Suganthi Mani
Resolution: Fixed Votes: 2
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-38366 Replica set nodes update the term wit... Closed
is related to SERVER-38504 On Primary, to verify that our lastAp... Closed
is related to SERVER-38505 For pv1, to determine if the oplog e... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6, v3.4
Sprint: Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17
Participants:
Case:

 Description   

As a sanity check in case a user manually modifies their local database in a way that puts them into this state



 Comments   
Comment by Githook User [ 23/Jan/19 ]

Author:

{'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}

Message: SERVER-35608 Added an invariant to make sure that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1.

(cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068)
Branch: v3.6
https://github.com/mongodb/mongo/commit/1ce959ee43baeaa6679d8b50c2e80d4650e94e3a

Comment by Githook User [ 09/Jan/19 ]

Author:

{'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}

Message: SERVER-35608 Added an invariant to make sure that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1.

(cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068)
Branch: v3.4
https://github.com/mongodb/mongo/commit/ee1e46cee281560bf13529c6db75cfb317703780

Comment by Githook User [ 08/Jan/19 ]

Author:

{'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}

Message: SERVER-35608 Added an invariant to make sure that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1.

(cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068)
Branch: v4.0
https://github.com/mongodb/mongo/commit/938f2b25a50a4c907b736ffe81546ae4c42e4f0c

Comment by Githook User [ 11/Dec/18 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-35608 Added an invariant to make sure that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1.
Branch: master
https://github.com/mongodb/mongo/commit/b91aaa5bbc54a176cc61e5051cb6be857747b068

Comment by Suganthi Mani [ 11/Dec/18 ]

We considered adding the invariant "lastAppliedOptime's term should be always less than or equal to our current term" in 2 places

  1. ReplicationCoordinatorImpl::_setMyLastAppliedOpTime_inlock
  2. ReplicationCoordinatorImpl::_finishLoadLocalConfig

Adding the invariant in ReplicationCoordinatorImpl::_setMyLastAppliedOpTime_inlock may cause failure for below valid cases at which the lastAppliedOptime's term can be greater than the current term.

  1. Downgrade (pv1-> pv0)
    On secondaries, learning the term via heartbeat and the oplog application is asynchronous. Consider the case, where the secondaries learnt their current term -1 (i.e) pv0  via the heartbeat response from primary followed by oplog application of a batch having oplog entries from its previous protocol version pv1 where the term is greater than -1. 
  2. Upgrade (pv0 -> pv1)
    Consider the case, where the secondaries apply the oplog entries( having optime with term equal to 0) from current protocol version pv1 followed by the heartbeat response which inform secondary about the new term 0 (i.e) pv1.
  3. Downgrade, upgrade (pv1->pv0->pv1)
    Consider the case, where the secondaries learnt their current term 0 (i.e) pv1  via the heartbeat response from primary while still applying the oplog entries from the first protocol version pv1 which will have optime >=1.

Adding the invariant only in ReplicationCoordinatorImpl::_finishLoadLocalConfig may cause failure for below valid downgrade sequence at which the lastAppliedOptime's term can be greater than the current term.

  1. Start a replica set in pv1.
  2. Insert some document in pv1 (for term =1).
  3. Downgrade to pv0 while secondary is still replicating the documents from previous protocol version pv1 (term =1)
  4. Secondary learn its new term 0 (i.e) pv0 via heartbeat response and persists the information in  "system.replset" collection.
  5. Secondary crashes.

When secondary reboots, the current term will be -1(pv0) with the last entry in the "oplog" collection from pv1 (which will be set as our lastAppliedOpTime). This is similar to the problem described in HELP-6818 .  It's tough for us to distinguish between the manual and the self induced case. 

In order to prevent from data loss for the cases like in HELP-6818 , our new fix is to fail at the time of data insertion instead of failing at the time of node's startup phase. The contract we have right now is that, in pv1, oplog entries are ordered by non-decreasing term and strictly increasing timestamp. So, added an invariant such that that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1.

Comment by Gregory McKeon (Inactive) [ 27/Nov/18 ]

This work revealed bug(s) in upgrade/downgrade/upgrade of protocol version.

Comment by Suganthi Mani [ 15/Nov/18 ]

greg.mckeon, aiming for the patch to be in CR by EOD.

Generated at Thu Feb 08 04:40:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.