[SERVER-3600] "too stale to catch up" leaves state in RECOVERING Created: 15/Aug/11 Updated: 06/Dec/22 Resolved: 07/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Usability |
| Affects Version/s: | 1.9.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Tony Hannan | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Do | Votes: | 7 |
| Labels: | sync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
If "too stale to catch up" means it will never catch up then I think this deserves a different state besides RECOVERING, which implies it will catch up eventually. Maybe the new state should be STALE indicating to the user that he will need to resync it. |
| Comments |
| Comment by Steven Vannelli [ 07/Oct/19 ] | |
|
Closing this ticket as Won't Do as the parent Epic is no longer needed at this time. | |
| Comment by James Jones [ 26/Jul/16 ] | |
|
perhaps TOO_STALE just like the log indicates | |
| Comment by David Hows [ 19/Oct/12 ] | |
|
The errmsg in these cases also implies that mongod could recover eventually. The message is
| |
| Comment by Tony Hannan [ 15/Aug/11 ] | |
|
Or distinguish between RECOVERING and STALE as follows: RECOVERING means it is currently applying updates to catch up. STALE means it is behind and is just sitting there doing nothing. For example, if a node is behind and can't reach a node that has a long enough oplog then it marks itself as STALE. But as soon as it sees that node and starts applying its oplog it marks itself as RECOVERING. | |
| Comment by Tony Hannan [ 15/Aug/11 ] | |
|
Ok, thanks for clarification. We could have it go to STALE state if it can see all other members' oplogs and know that it is too far behind all of them. Up to you and Eliot if it is worth it. | |
| Comment by Kristina Chodorow (Inactive) [ 15/Aug/11 ] | |
|
The issue is that it might be able to sync from someone else, e.g., if you had 3 servers with the following oplogs: A: 3pm - 6pm If A can only reach C, it is stale, but if it reaches B it can start syncing from it. | |
| Comment by Tony Hannan [ 15/Aug/11 ] | |
|
Similar to this ticket but assign different state |