[SERVER-10225] Replica set failover speed improvement Created: 16/Jul/13 Updated: 28/Oct/15 Resolved: 23/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.5 |
| Fix Version/s: | 3.1.9 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 7 |
| Labels: | elections | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Sprint: | RPL A (10/09/15) | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
Election algorithm modifications will be made to reduce failover time. Target maximum downtime is 2 seconds. |
| Comments |
| Comment by Eric Milkie [ 23/Sep/15 ] |
|
With |
| Comment by Jose Luis Pedrosa [ 21/Aug/14 ] |
|
HI All, I would like to ask that in case of manual failover (replSetSetpDown), it can't be speeded up just by triggering the actual failover happens just after the new primary is elected? In other words, let's start the election, and once the new primary is available, let's failover and reduce the time in which the writes are not available. We are studying using MongoDB in a real time system, in which 3-4 seconds of a manual failover, can trigger bigger systems failovers. (if the application that would use mongo does not respond for 3 seconds, the backup system would kick in), this is very inconvenient as forces to disable other systems failover any time you want to compact a db in a secondary and put it back as primary. Best Regards |
| Comment by Nelson Elhage [ 16/Apr/14 ] |
|
re: (2), the rs.freeze() method is exactly this: http://docs.mongodb.org/manual/reference/method/rs.freeze/ for (1), what we do is freeze all nodes other than the desired new master. You can also configure priorities to prevent nodes other than the desired primary from attempting to get elected. |
| Comment by Zeki Mokhtarzada [ 16/Apr/14 ] |
|
I would like to suggest two possible solutions: 1) Add an optional parameter to rs.stepDown that allows the administrator to pass in the prefered new Master. This would let other members in the cluster that the first vote in the next election should be the new Master. 2) Allow secondaries to stepDown(seconds). If I can stepDown all of the non-eligible secondaries, then the next election will happen quickly since only one viable master will be available. |