- 
    Type:
Bug
 - 
    Resolution: Done
 - 
    Priority:
Major - P3
 - 
    None
 - 
    Affects Version/s: 2.4.5
 - 
    Component/s: Replication
 - 
    None
 - 
    Environment:ProofOfConcept: Node#1 - Primary, Node#2 - Secondary and Arbiter
 
- 
        ALL
 - 
        
 - 
        None
 
- 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 
Last week, there was a failure of AWS DNS resolution which caused a specific Amazon Availability Zone to not be able to resolve DNS. Other AZ's WERE able to resolve DNS, including records of hosts in the "DNS-failed" zone.
In a nutshell, we have the following situation which led to both nodes in "SECONDARY" state:
PRIMARY (db01srv02) - suddenly can't see the SECONDARY or the ARBITER. It steps down.
SECONDARY (db01srv01) - CAN see the Primary and the Arbiter. It refuses to elect itself because "db01srv02.local.:20001 would veto"
(n.b. - after upgrading to 2.4.5, I now get the more descriptive error "Sun Jul 28 12:43:36 [rsMgr] not electing self, db01srv02.local.:20001 would veto with 'I don't think db01srv01.local.:10001 is electable'"
Disclaimer - I'm not a DB Expert, so this may be expected behavior for some reason....
- related to
 - 
                    
SERVER-10375 DNS failures can cause a primary-less state that wouldn't exist if a node had gone down entirely
-         
 - Closed
 
 -