[SERVER-6026] Primary failed - secondaries would not take over Created: 07/Jun/12 Updated: 15/Aug/12 Resolved: 12/Jul/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Colin Howe | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
ubuntu |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
Our primary server ran out of disk space (due to us turning on verbose logging to diagnose a different problem). This then, correctly, caused the primary to step down. However, it was a single operation ahead of any of the secondaries. We then saw that none of the secondaries would take over as primary because the primary would veto (due to being ahead). And so, our set was left without a primary until we killed the mongo process on the primary (freeing up space by deleting the large log files had no affect as mongo still had a lock on the log file). In this situation it would be good if:
|
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 11/Jun/12 ] |
|
We actually have tests that make sure this work (step down the primary, make sure the secondaries catch up and become primary). If you do come across it again, please send along the logs. |
| Comment by Colin Howe [ 09/Jun/12 ] |
|
Unfortunately we don't have the logs - we deleted them as the quickest way to get the server back up and running. I imagine it wouldn't be too hard to replicate though |
| Comment by Kristina Chodorow (Inactive) [ 08/Jun/12 ] |
|
Never mind on version #, see now this is 2.0.5. |
| Comment by Kristina Chodorow (Inactive) [ 08/Jun/12 ] |
|
The secondaries should continue replicating from the primary. Do you have the logs from the secondaries/primary? What version was this? |