for environments that cannot afford a risk of data loss or inconsistency (ie: that would prefer unavailability instead), the arbiter could provide a mode where the election of a new primary is allowed only if are absolutely sure that there is an up to date node in the replica set. this guarantee can be offered if the primary has been shut down gracefull, but not in the case of a crash.
the arbiter could handle those 2 situations differently:
- graceful shutdown: usual process => a new primary gets elected
- crash (or no guarantee that a node is up to date for sure): do nothing, and wait for a human operator or after a timeout (eg: 1 hour) to tell the arbiter to bypass the security and start the election process.
that way, we have:
- atotimatic availability when we know there is no risk for the data
- when there is a risk, we give ourself time to try restoring the crashed primary (within a define sla period for availability)
- we can always force the process if we can confirm before the end of the sla that there are actually no risks, or if there is nothing to recover from the old primary
- at the end of the sla period for availability, the process gets forced automatically