Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14139

Disk failure on one node can (eventually) block a whole cluster

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication, Storage
    • Labels:
      None
    • Operating System:
      ALL
    • Case:

      Description

      If a disk failure occurs in such a way as to block IO without returning (admittedly a rare occurrence), the affected mongod will never give up waiting for the IO to complete. Heartbeats are returned as normal, so other nodes will continue to trust it despite being permanently dysfunctional.

      A replica-set or a sharded cluster can eventually be locked up until the single faulty node is identified and terminated.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                5 Vote for this issue
                Watchers:
                46 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: