[SERVER-9589] semi-automatic new primary election / graceful shutdown of replica sets Created: 06/May/13  Updated: 06/Dec/22  Resolved: 23/Aug/18

Status: Closed
Project: Core Server
Component/s: Admin, Replication
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Vincent Sevel Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: elections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-9588 graceful shutdown Closed
Assigned Teams:
Replication
Participants:

 Description   

for environments that cannot afford a risk of data loss or inconsistency (ie: that would prefer unavailability instead), the arbiter could provide a mode where the election of a new primary is allowed only if are absolutely sure that there is an up to date node in the replica set. this guarantee can be offered if the primary has been shut down gracefull, but not in the case of a crash.
the arbiter could handle those 2 situations differently:

  • graceful shutdown: usual process => a new primary gets elected
  • crash (or no guarantee that a node is up to date for sure): do nothing, and wait for a human operator or after a timeout (eg: 1 hour) to tell the arbiter to bypass the security and start the election process.

that way, we have:

  • atotimatic availability when we know there is no risk for the data
  • when there is a risk, we give ourself time to try restoring the crashed primary (within a define sla period for availability)
  • we can always force the process if we can confirm before the end of the sla that there are actually no risks, or if there is nothing to recover from the old primary
  • at the end of the sla period for availability, the process gets forced automatically


 Comments   
Comment by Spencer Brody (Inactive) [ 23/Aug/18 ]

This can be accomplished with a two node replica set or by manipulating the number of 'votes' assigned to nodes in the replica set config.

Comment by Vincent Sevel [ 14/May/13 ]

if the primary crashes, the secondary will become primary no matter how late it was in its replication, and the application will be immediately available serving stale data.
in the context of a crash, for a db where accurate data is critical, we would rather wait (and be unavailable) than take the risk to serve stale data. so in the context of a crash, do not elect a new primary except if:

  • the crashed db came back online
  • or the operator forced a new election
  • or we have been without a primary for one hour (configurable)
    only when the node was properly shutdown, and you are sure at least one node in the cluster is up to date, can you allow an automatic election of a new primary.
Comment by Eliot Horowitz (Inactive) [ 14/May/13 ]

I think stepDown already does what you want.
If you stepDown a primary, another node cannot become primary until it has all of the writes from the primary.

Generated at Thu Feb 08 03:20:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.