Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: Needs Further Definition
Affects Version/s: 2.0.5
Component/s: Replication
Labels:
- majority

Assigned Teams:

Replication
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a replica set setup with 2 nodes + 1 arbiter. The SAN storage in our primary's DC went down(ish), resulting in timeouts on I/O operations. Secondary node had detected that the primary was unreachable, but arbiter still marked the primary as available, since the TCP connection to the primary was still active. Failover to the secondary node did not work, causing downtime.

The arbiter should periodically query the different nodes with an I/O operation to detect whether the underlying I/O subsystem is still working.

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Leon Mergen
Participants:: [DO NOT USE] Backlog - Replication Team, Eric Milkie, Leon Mergen
Votes:: 1 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: May 24 2012 10:13:05 AM UTC
Updated:: Dec 06 2022 05:33:07 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates