[SERVER-10575] Two Primaries if Replica Set Heartbeats Fail in One Direction: "Old" Primary doesn't relinquish. Created: 19/Aug/13  Updated: 10/Dec/14  Resolved: 19/Feb/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stephen Lee Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Three Node Replica Set (2 data bearing, 1 arbiter)


Issue Links:
Depends
Duplicate
duplicates SERVER-9765 Two primaries should cause the earlie... Closed
is duplicated by SERVER-8145 Two primaries for the same replica set Closed
Related
related to SERVER-12793 One way heartbeats can result in stal... Closed
Operating System: ALL
Steps To Reproduce:

1. Setup three node replica set (2 data bearing, 1 arbiter) and run mongod with '--maxConns 5'.
2. Ensure that the primary (A) has an replica set member configuration _id less than the secondary (B).
3. Use mongo shell to connect repeatedly to A such that replica set heartbeats from the arbiter or B fail to connect to A.

Participants:

 Description   

Because of this line, it's possible for two primaries to exist simultaneously if the "old" primary has an _id less than the "new" primary. "old" primary will stay in PRIMARY state until restarted.



 Comments   
Comment by Eric Milkie [ 19/Feb/14 ]

Fix was committed in linked SERVER ticket for 2.6.0rc0

Comment by Andreas Nilsson [ 19/Feb/14 ]

milkie the issue sverch describes above is very real since a replica set which is transitioned incorrectly to an inconsistent SSL state where the nodes cannot talk to each other will still return rs.status depicting a functioning set.

One suggestion could be to change the state to unreachable if the bi-directional ping does start working over a couple of health poll cycles. That is after say 5 failed outgoing pings we will set the state to unreachable.

Another option is simply to require the bidirectional ping to work.

Comment by Shaun Verch [ 19/Feb/14 ]

Another way to cause a heartbeat to fail in one direction is to use the "sslMode" parameter in https://jira.mongodb.org/browse/SERVER-11431 to upgrade a single node to "requireSSL" and leave the others as "allowSSL". The "allowSSL" nodes will not be able to connect to the "requireSSL" node, but the "requireSSL" node will be able to connect to the "allowSSL" node.

Generated at Thu Feb 08 03:23:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.