Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 1.6.5
Component/s: Replication
Labels:
None
Environment:
Replica set on Ubuntu 10.04 on Amazon EC2 nodes with sets of EBS volumes.

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Today's (mostly EBS related) outage in Amazon AWS caused the same effect twice in our replica set.

The root cause was that the EBS volumes became unavailable, which in turn were mounted via mdadm and lvm. I can see that the kernel probably leaves the Mongo server just unaware or guessing here about what's going on, but what happened was that:

1) the master didn't step down, probably because its network was fine, and it didn't lag
2) all clients kept connecting to the node that didn't work

Would there be any way for the master, or the slaves to detect this situation and fail over?

Next to that, I am not aware of any option to have a slave manually step up either, instead of having the master step down. In the above scenario, the master didn't allow Mongo shell access because of the bad data partition, leaving no way to tell the master to step down, other then powering it off.

Cheers

Assignee:: Unassigned
Reporter:: Pieter Ennes
Participants:: Eliot Horowitz, Pieter Ennes
Votes:: 1 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 21 2011 07:47:20 PM UTC
Updated:: Mar 30 2012 02:26:48 PM UTC
Resolved:: Sep 02 2011 04:54:40 AM UTC

Details

Description

Attachments

Activity

People

Dates