[SERVER-3600] "too stale to catch up" leaves state in RECOVERING Created: 15/Aug/11  Updated: 06/Dec/22  Resolved: 07/Oct/19

Status: Closed
Project: Core Server
Component/s: Replication, Usability
Affects Version/s: 1.9.2
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Tony Hannan Assignee: Backlog - Replication Team
Resolution: Won't Do Votes: 7
Labels: sync
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by TOOLS-45 MongoStat does not indicate if replic... Closed
Related
related to SERVER-1933 "too stale to catch up" leaves state ... Closed
Assigned Teams:
Replication
Participants:

 Description   

If "too stale to catch up" means it will never catch up then I think this deserves a different state besides RECOVERING, which implies it will catch up eventually. Maybe the new state should be STALE indicating to the user that he will need to resync it.



 Comments   
Comment by Steven Vannelli [ 07/Oct/19 ]

Closing this ticket as Won't Do as the parent Epic is no longer needed at this time.

Comment by James Jones [ 26/Jul/16 ]

perhaps TOO_STALE just like the log indicates

Comment by David Hows [ 19/Oct/12 ]

The errmsg in these cases also implies that mongod could recover eventually.

The message is

"still syncing, not yet to minValid optime 507e9a30:851"

Comment by Tony Hannan [ 15/Aug/11 ]

Or distinguish between RECOVERING and STALE as follows: RECOVERING means it is currently applying updates to catch up. STALE means it is behind and is just sitting there doing nothing. For example, if a node is behind and can't reach a node that has a long enough oplog then it marks itself as STALE. But as soon as it sees that node and starts applying its oplog it marks itself as RECOVERING.

Comment by Tony Hannan [ 15/Aug/11 ]

Ok, thanks for clarification. We could have it go to STALE state if it can see all other members' oplogs and know that it is too far behind all of them. Up to you and Eliot if it is worth it.

Comment by Kristina Chodorow (Inactive) [ 15/Aug/11 ]

The issue is that it might be able to sync from someone else, e.g., if you had 3 servers with the following oplogs:

A: 3pm - 6pm
B: 4pm - 8pm
C: 7pm - 10pm

If A can only reach C, it is stale, but if it reaches B it can start syncing from it.

Comment by Tony Hannan [ 15/Aug/11 ]

Similar to this ticket but assign different state

Generated at Thu Feb 08 03:03:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.