[SERVER-9494] Rollback Throws Continuous Warnings And Does Not Appear To Be Completing Created: 28/Apr/13  Updated: 10/Dec/14  Resolved: 12/Jul/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stuart Rench Assignee: Gianfranco Palumbo
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 5.6 Rackspace Cloud


Issue Links:
Duplicate
duplicates SERVER-9481 Server in replication set fluctuates ... Closed
Operating System: ALL
Participants:

 Description   

Secondary went into rollback mode, then generates a massive amount of logs (~18GB in less than 16 hours).

Sun Apr 28 14:06:41 [conn219706] command: unlock requested
Sun Apr 28 14:06:41 [rsSync] repl: old cursor isDead, will initiate a new one
Sun Apr 28 14:06:42 [rsSync] replSet syncing to: [primary address]:27017
Sun Apr 28 14:06:42 [rsSync] replSet our last op time written: Apr 28 12:01:41:1f
Sun Apr 28 14:06:42 [rsSync] replset source's GTE: Apr 28 13:01:00:20
Sun Apr 28 14:06:42 [rsSync] replSet rollback 0
Sun Apr 28 14:06:42 [rsSync] replSet ROLLBACK
Sun Apr 28 14:06:42 [rsSync] replSet rollback 1
Sun Apr 28 14:06:42 [rsSync] replSet rollback 2 FindCommonPoint
Sun Apr 28 14:06:42 [rsSync] replSet info rollback our last optime: Apr 28 12:01:41:1f
Sun Apr 28 14:06:42 [rsSync] replSet info rollback their last optime: Apr 28 14:06:42:22
Sun Apr 28 14:06:42 [rsSync] replSet info rollback diff in end of log times: -7501 seconds

FOLLOWED BY COUNTLESS SIMILAR TO:
Sun Apr 28 16:50:07 [rsSync] replSet WARNING ignoring op on rollback no _id TODO : inv.system.indexes
{ ts: Timestamp 1367150501000|31, h: -3893062772133308244, op: "i", ns: "inv.system.indexes", o: { u
nique: true, ns: "inv.fs.configs.chunks", name: "files_id_1_n_1", key:

{ files_id: 1, n: 1 }

} }

THEN THIS:
Sun Apr 28 16:49:08 [rsSync] replSet rollback error RS101 reached beginning of local oplog
Sun Apr 28 16:49:08 [rsSync] replSet them: [primary address]:27017 scanned: 40993835
Sun Apr 28 16:49:08 [rsSync] replSet theirTime: Jul 16 09:03:02 5003d8c6:13
Sun Apr 28 16:49:08 [rsSync] replSet ourTime: Apr 19 00:01:27 51708957:17
Sun Apr 28 16:49:08 [rsSync] replSet rollback 2 error RS101 reached beginning of local oplog [2]
Sun Apr 28 16:50:04 [rsSync] replSet syncing to: [primary address]:27017
Sun Apr 28 16:50:05 [rsSync] replSet our last op time written: Apr 28 12:01:41:1f
Sun Apr 28 16:50:05 [rsSync] replset source's GTE: Apr 28 13:01:00:20
Sun Apr 28 16:50:05 [rsSync] replSet rollback 0
Sun Apr 28 16:50:05 [rsSync] replSet rollback 1
Sun Apr 28 16:50:05 [rsSync] replSet rollback 2 FindCommonPoint
Sun Apr 28 16:50:05 [rsSync] replSet info rollback our last optime: Apr 28 12:01:41:1f
Sun Apr 28 16:50:05 [rsSync] replSet info rollback their last optime: Apr 28 16:50:05:6
Sun Apr 28 16:50:05 [rsSync] replSet info rollback diff in end of log times: -17304 seconds

It seems to simply repeat these patterns.



 Comments   
Comment by Randolph Tan [ 08/Jul/13 ]

Hi,

This looks like a bug that was recently fixed in the master branch (SERVER-9481). The bug is triggered when the rollback window exceeds 30 minutes.

Thanks!

Comment by Gianfranco Palumbo [ 08/Jul/13 ]

Hi Stuart,

Are you still seeing this issue?

Can you please provide more information on how this replicaset is configured and deployed?

1) rs.conf() and rs.status()
2) log files of this server as well as the primary and another healthy secondary during the same period of the error
3) location of these servers. e.g. are these servers located different datacenters, have you noticed any network problems or latency between any of the various nodes.

Please see here for further information on Rollbacks.
http://docs.mongodb.org/manual/core/replica-set-rollbacks/

Regards,
Gianfranco

Generated at Thu Feb 08 03:20:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.