[SERVER-35608] Invariant that term from lastAppliedOptime is never greater than our current term Created: 14/Jun/18 Updated: 29/Oct/23 Resolved: 11/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.19, 3.6.11, 4.0.6, 4.1.7 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
||||||||||||||||||||
| Sprint: | Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
As a sanity check in case a user manually modifies their local database in a way that puts them into this state |
| Comments |
| Comment by Githook User [ 23/Jan/19 ] |
|
Author: {'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}Message: (cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068) |
| Comment by Githook User [ 09/Jan/19 ] |
|
Author: {'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}Message: (cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068) |
| Comment by Githook User [ 08/Jan/19 ] |
|
Author: {'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}Message: (cherry picked from commit b91aaa5bbc54a176cc61e5051cb6be857747b068) |
| Comment by Githook User [ 11/Dec/18 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Suganthi Mani [ 11/Dec/18 ] |
|
We considered adding the invariant "lastAppliedOptime's term should be always less than or equal to our current term" in 2 places
Adding the invariant in ReplicationCoordinatorImpl::_setMyLastAppliedOpTime_inlock may cause failure for below valid cases at which the lastAppliedOptime's term can be greater than the current term.
Adding the invariant only in ReplicationCoordinatorImpl::_finishLoadLocalConfig may cause failure for below valid downgrade sequence at which the lastAppliedOptime's term can be greater than the current term.
When secondary reboots, the current term will be -1(pv0) with the last entry in the "oplog" collection from pv1 (which will be set as our lastAppliedOpTime). This is similar to the problem described in HELP-6818 . It's tough for us to distinguish between the manual and the self induced case. In order to prevent from data loss for the cases like in HELP-6818 , our new fix is to fail at the time of data insertion instead of failing at the time of node's startup phase. The contract we have right now is that, in pv1, oplog entries are ordered by non-decreasing term and strictly increasing timestamp. So, added an invariant such that that optime with lower and higher term than the current lastAppliedOpTime will have lower and higher timestamp respectively. And, provided both the optime and the current lastAppliedOpTime terms are in pv1. |
| Comment by Gregory McKeon (Inactive) [ 27/Nov/18 ] |
|
This work revealed bug(s) in upgrade/downgrade/upgrade of protocol version. |
| Comment by Suganthi Mani [ 15/Nov/18 ] |
|
greg.mckeon, aiming for the patch to be in CR by EOD. |