Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.5
Component/s: Replication
Labels:
None
Environment:
ProofOfConcept: Node#1 - Primary, Node#2 - Secondary and Arbiter

Operating System:
ALL
Steps To Reproduce:

Hide

I was able to reproduce the situation where:
(1) The Primary is UNABLE to see the Secondary and the Arbiter
(2) the Secondary and Arbiter ARE able to see the Primary

I simulated this by putting the Secondary and the arbiter on one server, and the Primary on a different server.
I tested with our current production version (2.0.7) and also with 2.2.5 and 2.4.5

Show
I was able to reproduce the situation where: (1) The Primary is UNABLE to see the Secondary and the Arbiter (2) the Secondary and Arbiter ARE able to see the Primary I simulated this by putting the Secondary and the arbiter on one server, and the Primary on a different server. I tested with our current production version (2.0.7) and also with 2.2.5 and 2.4.5
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Last week, there was a failure of AWS DNS resolution which caused a specific Amazon Availability Zone to not be able to resolve DNS. Other AZ's WERE able to resolve DNS, including records of hosts in the "DNS-failed" zone.

In a nutshell, we have the following situation which led to both nodes in "SECONDARY" state:

PRIMARY (db01srv02) - suddenly can't see the SECONDARY or the ARBITER. It steps down.
SECONDARY (db01srv01) - CAN see the Primary and the Arbiter. It refuses to elect itself because "db01srv02.local.:20001 would veto"

(n.b. - after upgrading to 2.4.5, I now get the more descriptive error "Sun Jul 28 12:43:36 [rsMgr] not electing self, db01srv02.local.:20001 would veto with 'I don't think db01srv01.local.:10001 is electable'"

Disclaimer - I'm not a DB Expert, so this may be expected behavior for some reason....

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongoDBSplitBrainLog.txt
16 kB
Jul 28 2013 12:46:27 PM UTC
mongoDBSplitBrainLog-2.4.5.txt
17 kB
Jul 28 2013 12:55:10 PM UTC

related to

SERVER-10375 DNS failures can cause a primary-less state that wouldn't exist if a node had gone down entirely

Closed

Assignee:: Matt Dannenberg (Inactive)
Reporter:: Michael Tewner
Participants:: Matt Dannenberg, Michael Tewner
Votes:: 1 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jul 28 2013 12:46:27 PM UTC
Updated:: Aug 21 2015 02:12:17 AM UTC
Resolved:: Jul 29 2013 05:59:58 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates