[SERVER-19103] "error RS102 too stale to catch up" can not show in rs.status() Created: 24/Jun/15  Updated: 26/Jun/15  Resolved: 24/Jun/15

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: xiaoli wang Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

1. set oplogSize=100
2. start mongodb replica set, and 1 primary 2 secondary.
3. halt the primary host
4. after 2 hours, startup the halted host.
------------ [ confusion things]:
5. sometimes RECOVERING node just has "infoMessage": "still syncing, not yet to minValid optime xxxxxx".
sometimes RECOVERING node has "errmsg": "error RS102 too stale to catch up"
6. in log:

2016-04-22T02:34:59.656+0800 [rsBackgroundSync] replSet syncing to: 172.28.11.141:27017
2016-04-22T02:34:59.657+0800 [rsBackgroundSync] replSet syncing to: 172.28.11.142:27017
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from 172.28.11.142:27017
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet our last optime : Apr 21 23:57:42 5718f876:50a
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet oldest at 172.28.11.142:27017 : Apr 22 01:31:16 57190e64:5
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet error RS102 too stale to catch up
2016-04-22T02:34:59.659+0800 [rsBackgroundSync] replSet RECOVERING
2016-04-22T02:34:59.947+0800 [initandlisten] connection accepted from 172.28.0.5:51220 #6 (6 connections now open)
2016-04-22T02:35:00.656+0800 [rsSync] replSet still syncing, not yet to minValid optime 57190e64:5
 
2016-04-22T05:05:13.225+0800 [rsSync] replSet still syncing, not yet to minValid optime 57190f64:29
2016-04-22T05:05:14.131+0800 [rsBackgroundSync] replSet syncing to: 172.28.11.141:27017
2016-04-22T05:05:14.133+0800 [rsBackgroundSync] replSet syncing to: 172.28.11.142:27017
2016-04-22T05:05:14.134+0800 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from 172.28.11.142:27017
2016-04-22T05:05:14.134+0800 [rsBackgroundSync] replSet our last optime : Apr 21 23:57:42 5718f876:50a
2016-04-22T05:05:14.134+0800 [rsBackgroundSync] replSet oldest at 172.28.11.142:27017 : Apr 22 01:35:51 57190f77:35
2016-04-22T05:05:14.134+0800 [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2016-04-22T05:05:14.134+0800 [rsBackgroundSync] replSet error RS102 too stale to catch up
2016-04-22T05:05:14.225+0800 [rsSync] replSet still syncing, not yet to minValid optime 57190f77:35

Participants:

 Description   

In the same conditions, node in "stale" state, but cannot get errmsg from rs.status().

version: 2.6.8
normal node data size:2.3G
recoverying node: 781M



 Comments   
Comment by xiaoli wang [ 26/Jun/15 ]

1. set oplogSize=100
2. start mongodb replica set, and 1 primary 2 secondary.
3. halt the primary host
4. after 2 hours, startup the halted host.
------------ [ confusion things]:
When go to: rs.status() -> "infoMessage": "still syncing, not yet to minValid optime xxxxxx". (in the real , it can not be synchroized by itself)

When go to: rs.status() -> "errmsg": "error RS102 too stale to catch up"

Comment by Ramon Fernandez Marina [ 24/Jun/15 ]

esala116, it seems that during those 2h there were more operations in your replica set than can fit in the small oplog you configured, so when the member comes back online it finds it has become stale and needs to go through a resync to be able to re-join the replica set. This is expected behavior, and you can either resync this node or increase the size of the oplog.

Regards,
Ramón.

Generated at Thu Feb 08 03:49:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.