[SERVER-10575] Two Primaries if Replica Set Heartbeats Fail in One Direction: "Old" Primary doesn't relinquish. Created: 19/Aug/13 Updated: 10/Dec/14 Resolved: 19/Feb/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stephen Lee | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Three Node Replica Set (2 data bearing, 1 arbiter) |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Steps To Reproduce: | 1. Setup three node replica set (2 data bearing, 1 arbiter) and run mongod with '--maxConns 5'. |
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Because of this line, it's possible for two primaries to exist simultaneously if the "old" primary has an _id less than the "new" primary. "old" primary will stay in PRIMARY state until restarted. |
| Comments |
| Comment by Eric Milkie [ 19/Feb/14 ] |
|
Fix was committed in linked SERVER ticket for 2.6.0rc0 |
| Comment by Andreas Nilsson [ 19/Feb/14 ] |
|
milkie the issue sverch describes above is very real since a replica set which is transitioned incorrectly to an inconsistent SSL state where the nodes cannot talk to each other will still return rs.status depicting a functioning set. One suggestion could be to change the state to unreachable if the bi-directional ping does start working over a couple of health poll cycles. That is after say 5 failed outgoing pings we will set the state to unreachable. Another option is simply to require the bidirectional ping to work. |
| Comment by Shaun Verch [ 19/Feb/14 ] |
|
Another way to cause a heartbeat to fail in one direction is to use the "sslMode" parameter in https://jira.mongodb.org/browse/SERVER-11431 to upgrade a single node to "requireSSL" and leave the others as "allowSSL". The "allowSSL" nodes will not be able to connect to the "requireSSL" node, but the "requireSSL" node will be able to connect to the "allowSSL" node. |