[SERVER-41261] Use the oplog entry after the common point to calculate rollbackTimeLimitSecs Created: 21/May/19  Updated: 29/Oct/23  Resolved: 12/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.0-rc3, 4.0.13, 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Alyson Cabral (Inactive) Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Documented
is documented by DOCS-12887 Investigate changes in SERVER-41261: ... Closed
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0
Sprint: Repl 2019-07-01, Repl 2019-07-15
Participants:
Case:

 Description   

In Atlas you can pause a cluster, effectively shutting the nodes down for a period of time.

Let's assume we pause for more than 24 hours and that all the nodes are current having committed all the writes. When they are restarted at the same time, we are seeing two nodes run and two branches of history forming. Eventually, one goes into rollback and gets a fassert because the common point is more than 24 hours behind even though we are only rolling back 1 or 2 very recent oplog entries. The common point, in this case, is from over 24 hours ago where the oplog entry immediately after the common point is from less than 5 mins ago.

While we believe we are fixing the two nodes running at the same time problem via SERVER-40336, it still makes sense to change this calculation if true network partitions occur after unpausing. Resolving this manually is a headache.



 Comments   
Comment by Githook User [ 19/Aug/19 ]

Author:

{'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com', 'name': 'Jason Chan'}

Message: SERVER-41261 Use the oplog entry after the common point to calculate rollbackTimeLimitSecs

(cherry picked from commit a5d088eefec42927f339ff9288f9eb078d5a8686)
Branch: v4.0
https://github.com/mongodb/mongo/commit/f5399b423d4f3d1603dd9aa1a2fdf2da69795b5d

Comment by Githook User [ 17/Jul/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com'}

Message: SERVER-41261 Use the oplog entry after the common point to calculate rollbackTimeLimitSecs

(cherry picked from commit a5d088eefec42927f339ff9288f9eb078d5a8686)
Branch: v4.2
https://github.com/mongodb/mongo/commit/7f56158f69facfc1b1504d62876d7f6f11848297

Comment by Githook User [ 12/Jul/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com'}

Message: SERVER-41261 Use the oplog entry after the common point to calculate rollbackTimeLimitSecs
Branch: master
https://github.com/mongodb/mongo/commit/a5d088eefec42927f339ff9288f9eb078d5a8686

Generated at Thu Feb 08 04:57:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.