[SERVER-4700] Critical replication failures should bring server into RECOVERING state Created: 17/Jan/12 Updated: 30/Mar/12 Resolved: 18/Jan/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
The dreaded "replSet error RS102 too stale to catch up" in my opinion should mark server as RECOVERING or failed in any other way, because currently there is no any other way to determine if replication occured. Silent failures on the other hand lead to potential loss of data in emergency cases. |
| Comments |
| Comment by Aristarkh Zagorodnikov [ 18/Jan/12 ] |
|
Sorry, checked the logs, it really moved to RECOVERING state. I guess I mixed that with I never got a warning from the MMS. I am very sorry for the false report again. |
| Comment by Aristarkh Zagorodnikov [ 18/Jan/12 ] |
|
It seldomly occurs with one of our replica sets, I'll post logs when (if) it happens again. |
| Comment by Eliot Horowitz (Inactive) [ 17/Jan/12 ] |
|
RS102 definitely makes server go into RECOVERING |