Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2978

Stepping down when (network) storage is unavailable

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 1.6.5
    • Replication
    • None
    • Replica set on Ubuntu 10.04 on Amazon EC2 nodes with sets of EBS volumes.

    Description

      Today's (mostly EBS related) outage in Amazon AWS caused the same effect twice in our replica set.

      The root cause was that the EBS volumes became unavailable, which in turn were mounted via mdadm and lvm. I can see that the kernel probably leaves the Mongo server just unaware or guessing here about what's going on, but what happened was that:

      1) the master didn't step down, probably because its network was fine, and it didn't lag
      2) all clients kept connecting to the node that didn't work

      Would there be any way for the master, or the slaves to detect this situation and fail over?

      Next to that, I am not aware of any option to have a slave manually step up either, instead of having the master step down. In the above scenario, the master didn't allow Mongo shell access because of the bad data partition, leaving no way to tell the master to step down, other then powering it off.

      Cheers

      Attachments

        Activity

          People

            Unassigned Unassigned
            skion Pieter Ennes
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: