[SERVER-8084] Replica set may be unavailable for up to 30 seconds after killing primary Created: 04/Jan/13 Updated: 10/Dec/14 Resolved: 05/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.2.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chris Heald | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | You can observe this behavior by using the Ruby driver's replica set test suite. Invoke the test suite with `rake test:replica_set` and then `tail -f data//.log` once the replica set logfiles have been created. You will observe that the RS times out for ~30 seconds after the primary is killed immediately after a successful election due to the inability of any of the remaining nodes to cast a vote due to the held leases. |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
This was discovered in the course of testing the Ruby driver. Issue here: https://jira.mongodb.org/browse/RUBY-523 consensus.cpp has a hardcoded 30-second "lease" after a RS member casts a vote, which prevents it from casting another vote in that time period. This means that if you spin up a replica set, allow a primary to be elected, and then kill the primary, the entire replica set is unavailable for up to 30 seconds, as the nodes have to wait out the lease in order to be able to cast another vote to elect a new master to replace the killed master. I think that once an election has succeeded, nodes should clear their lease timers, so that they are immediately available for another election. Additionally (or alternately), if a node holds a vote lease, it should check that the node that it voted for is still a part of the cluster before refusing to recast its vote. If the voted-for member has disappeared, then the node should cast a new vote. |