[DOCS-14177] Provide better clarity over on what timeout setting results in an election, for failover Created: 02/Feb/21  Updated: 30/Oct/23  Resolved: 13/Jul/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Task Priority: Major - P3
Reporter: Paul Done Assignee: Rea Rustagi
Resolution: Done Votes: 0
Labels: reopened, server-docs-bug-bash, triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 30 weeks ago
Epic Link: DOCSP-11702

 Description   

Description

There is a lot of confusion out there over what replica set config timeout setting causes secondaries to call for an election.  Specifically there is a lack of documentation clarity on the role of the following two parameters:

  • settings.heartbeatTimeoutSecs
  • settings.electionTimeoutMillis{{}}

See docs page: https://docs.mongodb.com/manual/reference/replica-configuration/#rsconf.settings.electionTimeoutMillis

According to https://groups.google.com/g/mongodb-user/c/RwLZvRV7DAg for replication protocol version 1  - pv1, (which, as per https://docs.mongodb.com/manual/reference/replica-set-protocol-versions/ was the default from 3.2 and is the only protocol version supported from version 4.0),  "the only knob that controls failover sensitivity in pv1 is electionTimeoutMillis" and "In v1, you can expect the timeout to be at most electionTimeoutMillis"

This needs to be made more clear in the docs for https://docs.mongodb.com/manual/reference/replica-configuration/#rsconf.settings.electionTimeoutMillis  for properties "settings.heartbeatTimeoutSecs" & "settings.electionTimeoutMillis".

At the moment, the docs do say "NOTE For pv1, settings.electionTimeoutMillis has a greater influence on whether the secondary members call for an election than the settings.heartbeatTimeoutSecs". Unfortunately this is a very woolly and vague statement which provides no concrete actionable value.

Also of note, is the core server source code README for replication, https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/README.md#user-content-communication which talks about "Check the liveness of the other nodes (heartbeats)". Again this is a bit vague but I suggest talking to the core replication developers who authored this about providing a far better description of what heartbeatTimeoutSecs is for and how it should be used at at.

Scope of changes

Specify that only electionTimeoutMillis is the only knob that controls failover sensitivity in pv1



 Comments   
Comment by Rea Rustagi [ 12/Jul/23 ]

https://github.com/10gen/docs-mongodb-internal/pull/3816

Comment by Sarah Olson [ 03/Nov/22 ]

Thanks dmitry.ryabtsev@mongodb.com!

 

Comment by Education Bot [ 31/Oct/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Generated at Thu Feb 08 08:09:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.