[SERVER-14523] SECONDARY in RECOVERING state after a resync Created: 10/Jul/14  Updated: 16/Nov/21  Resolved: 11/Jul/14

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sridhara Huchappa Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Run reSync operation on a SECONDARY with full-sync after clean-up of data directory.

Participants:

 Description   

We run a resync on SECONDARY on a prod replica. During the process, oplog on secondary was set to 131GB by mongo where as the PRIMARY oplog is only 3GB.

SECONDARY oplog
erep:RECOVERING> db.printReplicationInfo()
configured oplog size: 134152.873828125MB
log length start to end: 0secs (0hrs)
oplog first event time: Wed Jul 09 2014 09:12:53 GMT-0600 (MDT)
oplog last event time: Wed Jul 09 2014 09:12:53 GMT-0600 (MDT)
now: Thu Jul 10 2014 06:45:30 GMT-0600 (MDT)

PRIMARY Oplog

erep:PRIMARY> db.printReplicationInfo()
configured oplog size: 3181.0236328125MB
log length start to end: 56786secs (15.77hrs)
oplog first event time: Wed Jul 09 2014 14:57:31 GMT-0600 (MDT)
oplog last event time: Thu Jul 10 2014 06:43:57 GMT-0600 (MDT)
now: Thu Jul 10 2014 06:44:10 GMT-0600 (MDT)

erep:RECOVERING> rs.status()
{
"set" : "erep",
"date" : ISODate("2014-07-10T15:33:17Z"),
"myState" : 3,
"members" : [
{
"_id" : 0,
"name" : "wslermdb01sp:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 87654,
"optime" :

{ "t" : 1405006395, "i" : 9 }

,
"optimeDate" : ISODate("2014-07-10T15:33:15Z"),
"lastHeartbeat" : ISODate("2014-07-10T15:33:17Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 2
},

{ "_id" : 2, "name" : "wslerudh01sp:27017", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 87654, "lastHeartbeat" : ISODate("2014-07-10T15:33:16Z"), "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"), "pingMs" : 1 }

,
{
"_id" : 3,
"name" : "wslermdb02sp:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 94556,
"optime" :

{ "t" : 1404918773, "i" : 1 }

,
"optimeDate" : ISODate("2014-07-09T15:12:53Z"),
"self" : true
}
],
"ok" : 1
}

Question: what is the max oplog size we can use for both and should I make them both 131GB or go higher?



 Comments   
Comment by Ramon Fernandez Marina [ 11/Jul/14 ]

sridharah, the too stale to catch up message means your secondary has fallen behind and it's no longer replicating data. At this point it needs to be re-synced from the primary.

Note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Also, please review the documentation on the oplog, and how to change the oplog size for more information.

Comment by Sridhara Huchappa [ 10/Jul/14 ]

More information from the oplog:

Thu Jul 10 14:46:45.946 [rsBackgroundSync] replSet not trying to sync from wslermdb01sp:27017, it is vetoed for 8 more seconds
Thu Jul 10 14:46:45.946 [rsBackgroundSync] replSet not trying to sync from wslermdb01sp:27017, it is vetoed for 8 more seconds
Thu Jul 10 14:46:50.947 [rsBackgroundSync] replSet not trying to sync from wslermdb01sp:27017, it is vetoed for 3 more seconds
Thu Jul 10 14:46:50.947 [rsBackgroundSync] replSet not trying to sync from wslermdb01sp:27017, it is vetoed for 3 more seconds
Thu Jul 10 14:46:54.948 [rsBackgroundSync] replSet syncing to: wslermdb01sp:27017
Thu Jul 10 14:46:54.951 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from wslermdb01sp:27017
Thu Jul 10 14:46:54.951 [rsBackgroundSync] replSet our last optime : Jul 9 09:12:53 53bd5bf5:1
Thu Jul 10 14:46:54.951 [rsBackgroundSync] replSet oldest at wslermdb01sp:27017 : Jul 10 10:30:15 53bebf97:2
Thu Jul 10 14:46:54.951 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
Thu Jul 10 14:46:54.951 [rsBackgroundSync] replSet error RS102 too stale to catch up
Thu Jul 10 14:47:03.793 [conn9467] end connection 172.23.1.245:58571 (5 connections now open)
Thu Jul 10 14:47:03.794 [initandlisten] connection accepted from 172.23.1.245:58574 #9470 (7 connections now open)

Comment by Sridhara Huchappa [ 10/Jul/14 ]

The error message we see in the mongod.log on SECONDARY is :
Thu Jul 10 01:55:35.189 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from wslermdb01sp:27017

Generated at Thu Feb 08 03:35:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.