[SERVER-6026] Primary failed - secondaries would not take over Created: 07/Jun/12  Updated: 15/Aug/12  Resolved: 12/Jul/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Colin Howe Assignee: Kristina Chodorow (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu


Operating System: ALL
Participants:

 Description   

Our primary server ran out of disk space (due to us turning on verbose logging to diagnose a different problem). This then, correctly, caused the primary to step down. However, it was a single operation ahead of any of the secondaries.

We then saw that none of the secondaries would take over as primary because the primary would veto (due to being ahead). And so, our set was left without a primary until we killed the mongo process on the primary (freeing up space by deleting the large log files had no affect as mongo still had a lock on the log file).

In this situation it would be good if:

  • the primary had carried on replicating - allowing the secondaries to catch up and take over
  • the primary had completely died - we can deal with the rollback issue and write safety would have meant we were ok here anyway


 Comments   
Comment by Kristina Chodorow (Inactive) [ 11/Jun/12 ]

We actually have tests that make sure this work (step down the primary, make sure the secondaries catch up and become primary). If you do come across it again, please send along the logs.

Comment by Colin Howe [ 09/Jun/12 ]

Unfortunately we don't have the logs - we deleted them as the quickest way to get the server back up and running. I imagine it wouldn't be too hard to replicate though

Comment by Kristina Chodorow (Inactive) [ 08/Jun/12 ]

Never mind on version #, see now this is 2.0.5.

Comment by Kristina Chodorow (Inactive) [ 08/Jun/12 ]

The secondaries should continue replicating from the primary. Do you have the logs from the secondaries/primary? What version was this?

Generated at Thu Feb 08 03:10:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.