[SERVER-10225] Replica set failover speed improvement Created: 16/Jul/13  Updated: 28/Oct/15  Resolved: 23/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.5
Fix Version/s: 3.1.9

Type: Improvement Priority: Major - P3
Reporter: Eric Milkie Assignee: Eric Milkie
Resolution: Done Votes: 7
Labels: elections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-12385 election algorithm modifications Closed
is duplicated by SERVER-12163 Replica Set failover time is more tha... Closed
is duplicated by SERVER-8084 Replica set may be unavailable for up... Closed
Related
related to SERVER-11086 Election handoff to new primary, duri... Closed
is related to SERVER-9934 Slow failovers after step down due to... Closed
Backwards Compatibility: Fully Compatible
Sprint: RPL A (10/09/15)
Participants:

 Description   

Election algorithm modifications will be made to reduce failover time. Target maximum downtime is 2 seconds.
The new algorithm will be activated by running replSetReconfig with a field "protocolVersion" set to 1.



 Comments   
Comment by Eric Milkie [ 23/Sep/15 ]

With SERVER-18498 completed, replica set failover time can now be configured to be faster.

Comment by Jose Luis Pedrosa [ 21/Aug/14 ]

HI All,

I would like to ask that in case of manual failover (replSetSetpDown), it can't be speeded up just by triggering the actual failover happens just after the new primary is elected? In other words, let's start the election, and once the new primary is available, let's failover and reduce the time in which the writes are not available.

We are studying using MongoDB in a real time system, in which 3-4 seconds of a manual failover, can trigger bigger systems failovers. (if the application that would use mongo does not respond for 3 seconds, the backup system would kick in), this is very inconvenient as forces to disable other systems failover any time you want to compact a db in a secondary and put it back as primary.

Best Regards

Comment by Nelson Elhage [ 16/Apr/14 ]

re: (2), the rs.freeze() method is exactly this: http://docs.mongodb.org/manual/reference/method/rs.freeze/

for (1), what we do is freeze all nodes other than the desired new master. You can also configure priorities to prevent nodes other than the desired primary from attempting to get elected.

Comment by Zeki Mokhtarzada [ 16/Apr/14 ]

I would like to suggest two possible solutions:

1) Add an optional parameter to rs.stepDown that allows the administrator to pass in the prefered new Master. This would let other members in the cluster that the first vote in the next election should be the new Master.

2) Allow secondaries to stepDown(seconds). If I can stepDown all of the non-eligible secondaries, then the next election will happen quickly since only one viable master will be available.

Generated at Thu Feb 08 03:22:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.