[SERVER-9481] Server in replication set fluctuates between states SECONDARY and ROLLBACK if it can't rollback due to too much rollback data Created: 26/Apr/13 Updated: 11/Jul/16 Resolved: 20/Jul/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.1 |
| Fix Version/s: | 2.4.6, 2.5.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andreas Heck | Assignee: | Matt Dannenberg |
| Resolution: | Done | Votes: | 0 |
| Labels: | pull-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Follow this guide http://comerford.cc/2012/05/28/simulating-rollback-on-mongodb/ but use the following MongoDB command to create the data: for(var i = 0; i <= 100000000000000000 ; i+){for (var j = 0; j <= 1000; j+){ db.rollback_test.insert( {"a" : i * j});} sleep(100);} Run this command on the primary until you are confident that you have at least 300 MB of data (e.g. by checking the output of df -h as a rough estimate). |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
If you encounter a rollback situation in a replication set and the rollback data is more than 300 megabytes/30 minutes the rollback will fail. Unfortunately it is not clear from rs.status() that one node is in a state where it can't recover from. The node with the rollback problem will try to roll back again and again which can be seen in the mongodb.log: Fri Apr 26 13:28:55.957 [rsBackgroundSync] replSet syncing to: 10.128.128.102:27017 When you start the mongo shell on a replication set node and repeatedly execute the rs.status() command you will see that the affected secondary goes into the ROLLBACK state for a few seconds max, just to flip back into SECONDARY until it enters the next iteration of the rollback loop. Actually I would expect the affected node to stay in ROLLBACK until the rollback succeeds. If we are in a situation like this where manual intervention is necessary, maybe it would a good idea for the affected instance to go to the FATAL state. |
| Comments |
| Comment by auto [ 02/Aug/13 ] |
|
Author: {u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: |
| Comment by auto [ 02/Aug/13 ] |
|
Author: {u'username': u'aheck', u'name': u'Andreas Heck', u'email': u'aheck@gmx.de'}Message: Only leave ROLLBACK when successful and go to FATAL when ROLLBACK is impossible Signed-off-by: Matt Dannenberg <matt.dannenberg@10gen.com> |
| Comment by auto [ 12/Jun/13 ] |
|
Author: {u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: |
| Comment by auto [ 12/Jun/13 ] |
|
Author: {u'username': u'aheck', u'name': u'Andreas Heck', u'email': u'aheck@gmx.de'}Message: Only leave ROLLBACK when successful and go to FATAL when ROLLBACK is impossible Signed-off-by: Matt Dannenberg <matt.dannenberg@10gen.com> |
| Comment by Matt Kangas [ 10/Jun/13 ] |
|
Verified that contributor agreement has been signed. |
| Comment by Randolph Tan [ 17/May/13 ] |
|
Hi aheck, Have you signed the contributor agreement? Thanks! |
| Comment by Andreas Heck [ 08/May/13 ] |
|
I patched MongoDB 2.4.1 server such that MongoDB only leaves ROLLBACK after the rollback was successful and changes to FATAL in case a rollback fails because there is too much data which prevents the endless loop seen in the log. Are there any logical problems with that approach or problems with the patch itself? |