Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.2.2
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

You can observe this behavior by using the Ruby driver's replica set test suite. Invoke the test suite with `rake test:replica_set` and then `tail -f data//.log` once the replica set logfiles have been created. You will observe that the RS times out for ~30 seconds after the primary is killed immediately after a successful election due to the inability of any of the remaining nodes to cast a vote due to the held leases.

Show
You can observe this behavior by using the Ruby driver's replica set test suite. Invoke the test suite with `rake test:replica_set` and then `tail -f data/ / .log` once the replica set logfiles have been created. You will observe that the RS times out for ~30 seconds after the primary is killed immediately after a successful election due to the inability of any of the remaining nodes to cast a vote due to the held leases.
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This was discovered in the course of testing the Ruby driver. Issue here: https://jira.mongodb.org/browse/RUBY-523

consensus.cpp has a hardcoded 30-second "lease" after a RS member casts a vote, which prevents it from casting another vote in that time period. This means that if you spin up a replica set, allow a primary to be elected, and then kill the primary, the entire replica set is unavailable for up to 30 seconds, as the nodes have to wait out the lease in order to be able to cast another vote to elect a new master to replace the killed master.

I think that once an election has succeeded, nodes should clear their lease timers, so that they are immediately available for another election. Additionally (or alternately), if a node holds a vote lease, it should check that the node that it voted for is still a part of the cluster before refusing to recast its vote. If the voted-for member has disappeared, then the node should cast a new vote.

duplicates

SERVER-10225 Replica set failover speed improvement

Closed

is related to

RUBY-523 Runtime improvements for replica set and sharded cluster test suites

Closed

Assignee:: Unassigned
Reporter:: Chris Heald
Participants:: Chris Heald
Votes:: 1 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jan 04 2013 08:36:03 PM UTC
Updated:: Dec 10 2014 11:04:23 PM UTC
Resolved:: Mar 05 2014 09:43:16 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates