-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.2.2
-
Component/s: Replication
-
None
-
ALL
-
This was discovered in the course of testing the Ruby driver. Issue here: https://jira.mongodb.org/browse/RUBY-523
consensus.cpp has a hardcoded 30-second "lease" after a RS member casts a vote, which prevents it from casting another vote in that time period. This means that if you spin up a replica set, allow a primary to be elected, and then kill the primary, the entire replica set is unavailable for up to 30 seconds, as the nodes have to wait out the lease in order to be able to cast another vote to elect a new master to replace the killed master.
I think that once an election has succeeded, nodes should clear their lease timers, so that they are immediately available for another election. Additionally (or alternately), if a node holds a vote lease, it should check that the node that it voted for is still a part of the cluster before refusing to recast its vote. If the voted-for member has disappeared, then the node should cast a new vote.
- duplicates
-
SERVER-10225 Replica set failover speed improvement
- Closed
- is related to
-
RUBY-523 Runtime improvements for replica set and sharded cluster test suites
- Closed