-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.1
-
Component/s: Replication
-
ALL
-
If you encounter a rollback situation in a replication set and the rollback data is more than 300 megabytes/30 minutes the rollback will fail. Unfortunately it is not clear from rs.status() that one node is in a state where it can't recover from. The node with the rollback problem will try to roll back again and again which can be seen in the mongodb.log:
Fri Apr 26 13:28:55.957 [rsBackgroundSync] replSet syncing to: 10.128.128.102:27017
Fri Apr 26 13:29:01.030 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Fri Apr 26 13:29:01.030 [rsBackgroundSync] replSet rollback 0
Fri Apr 26 13:29:01.030 [rsBackgroundSync] replSet ROLLBACK
Fri Apr 26 13:29:01.030 [rsBackgroundSync] replSet rollback 1
Fri Apr 26 13:29:01.030 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Apr 26 13:29:01.031 [rsBackgroundSync] replSet rollback 2 error findcommonpoint waiting a while before trying again
Fri Apr 26 13:29:01.873 [rsSync] replSet SECONDARY
Fri Apr 26 13:29:10.640 [conn613] end connection 10.128.128.102:53295 (2 connections now open)
Fri Apr 26 13:29:10.641 [initandlisten] connection accepted from 10.128.128.102:53297 #615 (3 connections now open)
Fri Apr 26 13:29:11.031 [rsBackgroundSync] replSet syncing to: 10.128.128.102:27017
Fri Apr 26 13:29:11.388 [conn614] end connection 10.128.128.98:41406 (2 connections now open)
Fri Apr 26 13:29:11.389 [initandlisten] connection accepted from 10.128.128.98:41408 #616 (3 connections now open)
Fri Apr 26 13:29:16.104 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Fri Apr 26 13:29:16.104 [rsBackgroundSync] replSet rollback 0
Fri Apr 26 13:29:16.105 [rsBackgroundSync] replSet ROLLBACK
Fri Apr 26 13:29:16.105 [rsBackgroundSync] replSet rollback 1
Fri Apr 26 13:29:16.105 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Apr 26 13:29:16.105 [rsBackgroundSync] replSet rollback 2 error findcommonpoint waiting a while before trying again
Fri Apr 26 13:29:16.876 [rsSync] replSet SECONDARY
Fri Apr 26 13:29:26.105 [rsBackgroundSync] replSet syncing to: 10.128.128.102:27017
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet rollback 0
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet ROLLBACK
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet rollback 1
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Apr 26 13:29:31.172 [rsBackgroundSync] replSet rollback 2 error findcommonpoint waiting a while before trying again
Fri Apr 26 13:29:31.879 [rsSync] replSet SECONDARY
Fri Apr 26 13:29:40.774 [conn615] end connection 10.128.128.102:53297 (2 connections now open)
Fri Apr 26 13:29:40.774 [initandlisten] connection accepted from 10.128.128.102:53299 #617 (3 connections now open)
Fri Apr 26 13:29:41.173 [rsBackgroundSync] replSet syncing to: 10.128.128.102:27017
Fri Apr 26 13:29:41.412 [conn616] end connection 10.128.128.98:41408 (2 connections now open)
Fri Apr 26 13:29:41.413 [initandlisten] connection accepted from 10.128.128.98:41410 #618 (3 connections now open)
Fri Apr 26 13:29:46.245 [rsBackgroundSync] replSet we are ahead of the sync source, will try to roll back
Fri Apr 26 13:29:46.245 [rsBackgroundSync] replSet rollback 0
Fri Apr 26 13:29:46.245 [rsBackgroundSync] replSet ROLLBACK
Fri Apr 26 13:29:46.245 [rsBackgroundSync] replSet rollback 1
Fri Apr 26 13:29:46.245 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Apr 26 13:29:46.247 [rsBackgroundSync] replSet info rollback our last optime: Apr 26 10:52:56:f23
Fri Apr 26 13:29:46.248 [rsBackgroundSync] replSet info rollback their last optime: Apr 26 09:47:21:1000
Fri Apr 26 13:29:46.248 [rsBackgroundSync] replSet info rollback diff in end of log times: 3935 seconds
Fri Apr 26 13:29:46.248 [rsBackgroundSync] replSet rollback too long a time period for a rollback.
Fri Apr 26 13:29:46.248 [rsBackgroundSync] replSet rollback 2 error error not willing to roll back more than 30 minutes of data
Fri Apr 26 13:29:46.881 [rsSync] replSet SECONDARY
When you start the mongo shell on a replication set node and repeatedly execute the rs.status() command you will see that the affected secondary goes into the ROLLBACK state for a few seconds max, just to flip back into SECONDARY until it enters the next iteration of the rollback loop.
Actually I would expect the affected node to stay in ROLLBACK until the rollback succeeds. If we are in a situation like this where manual intervention is necessary, maybe it would a good idea for the affected instance to go to the FATAL state.
- is duplicated by
-
SERVER-9494 Rollback Throws Continuous Warnings And Does Not Appear To Be Completing
- Closed
- links to