[SERVER-5930] rollback loop should be smarter Created: 25/May/12  Updated: 06/Dec/22  Resolved: 02/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Azat Khuzhin Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: datarepl3.2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-18035 Data Replicator: Refactor Rollback Code Closed
is related to SERVER-7244 Rollback should handle system.indexes... Closed
is related to SERVER-13573 Retry rollback FindCommonPoint before... Closed
is related to SERVER-23392 Increase Replication Rollback (Data) ... Closed
Assigned Teams:
Replication
Participants:
Case:

 Description   

Maybe when mongodb trying to rollback data, after failover master,
trying and trying rollback again after it fails because of rollback limit (300mb current) is not a good idea.

"mongod" can not trying again if rollback fail because of limit.
After it restarted, it can trying rollback once again, but not more, or it can write this info to some collection and at start restore this information.



 Comments   
Comment by Eric Milkie [ 02/May/18 ]

The rollback limit has been removed and the rollback algorithm rewritten to be smarter.

Comment by Scott Hernandez (Inactive) [ 27/Apr/15 ]

Content from the post by Kristina which requested this issue be created:

Suppose you had a 3-node set, X, Y, Z, where X is primary, Y is 
secondary, and Z is an arbiter.  Y is a week behind, about to go 
stale.  Then X goes down, so Y becomes primary for a few minutes and 
does some writes.  Then X comes back up.  Should X rollback a week's 
worth of data automatically?  The 300mb should probably be a bit more 
of an advanced measure, but it's there to prevent that type of 
thing. 
 
I agree that having it endlessly loop on failure is a bad idea.  It 
should just exit or do something smarter. 
 
I don't see how it could be a symptom of how you shut it down, it's 
caused by a lot of data not being written to the secondary.  Which 
means the secondary was behind. 

Comment by Azat Khuzhin [ 25/May/12 ]

Also you can read this thread in google groups
https://groups.google.com/d/msg/mongodb-user/SZe0nCBCR80/n5aDQ2wb_6MJ

Generated at Thu Feb 08 03:10:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.