Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5917

More advanced availability checks required for Arbiter

    XMLWordPrintable

    Details

      Description

      We have a replica set setup with 2 nodes + 1 arbiter. The SAN storage in our primary's DC went down(ish), resulting in timeouts on I/O operations. Secondary node had detected that the primary was unreachable, but arbiter still marked the primary as available, since the TCP connection to the primary was still active. Failover to the secondary node did not work, causing downtime.

      The arbiter should periodically query the different nodes with an I/O operation to detect whether the underlying I/O subsystem is still working.

        Attachments

          Activity

            People

            Assignee:
            backlog-server-repl Backlog - Replication Team
            Reporter:
            solatis Leon Mergen
            Participants:
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: