Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14139

Disk failure on one node can (eventually) block a whole cluster

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Replication, Storage
    • None
    • Replication
    • ALL

      If a disk failure occurs in such a way as to block IO without returning (admittedly a rare occurrence), the affected mongod will never give up waiting for the IO to complete. Heartbeats are returned as normal, so other nodes will continue to trust it despite being permanently dysfunctional.

      A replica-set or a sharded cluster can eventually be locked up until the single faulty node is identified and terminated.

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            andrew.ryder@mongodb.com Andrew Ryder (Inactive)
            Votes:
            5 Vote for this issue
            Watchers:
            47 Start watching this issue

              Created:
              Updated:
              Resolved: