[SERVER-32794] Make timeouts unrelated to elections not depend on election timeout Created: 19/Jan/18  Updated: 30/Oct/23  Resolved: 22/Jan/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.3, 3.7.2

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-30642 Raise election timeouts as a way to p... Closed
Related
related to SERVER-32691 Create passthrough for w="majority" w... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6, v3.4
Sprint: Repl 2018-01-29
Participants:

 Description   

For testing it can be helpful to increase the election timeout to infinity. We have multiple timeouts that are calculated based on the election timeout which prevent this. We should add a maximum to these timeouts, potentially based on the heartbeat interval like in the TopologyCoordinator below:
https://github.com/mongodb/mongo/blob/9b6f404d30b944def9bcc77ebc8277fb97471080/src/mongo/db/repl/sync_source_feedback.cpp#L56
https://github.com/mongodb/mongo/blob/9b6f404d30b944def9bcc77ebc8277fb97471080/src/mongo/db/repl/oplog_fetcher.cpp#L74
https://github.com/mongodb/mongo/blob/9b6f404d30b944def9bcc77ebc8277fb97471080/src/mongo/db/repl/topology_coordinator.cpp#L1018



 Comments   
Comment by Githook User [ 26/Jan/18 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-32794 Make timeouts unrelated to elections not depend on election timeout

(cherry picked from commit f3b504948c0cef40deffb4786ebdda6797625142)
Branch: v3.6
https://github.com/mongodb/mongo/commit/390557cbb338c9924b142b7a6f8b0808e87b147d

Comment by Githook User [ 22/Jan/18 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-32794 Make timeouts unrelated to elections not depend on election timeout
Branch: master
https://github.com/mongodb/mongo/commit/f3b504948c0cef40deffb4786ebdda6797625142

Comment by Spencer Brody (Inactive) [ 22/Jan/18 ]

schwerin, for #21, it probably isn't strictly necessary to put a cap on the upper bound of this value, but then again that's probably true for all of these. My thought was just that since this is the primary channel for conveying liveness information through the set it makes sense to keep that channel somewhat active. The current plan is to put the upper bound at 1 minute, which should already be far higher than anyone is likely to use in practice.

Comment by Judah Schvimer [ 22/Jan/18 ]

siyuan.zhou, #21 is sync source feedback, not sync source resolver. Sync source feedback, i.e. replSetUpdatePosition, seems very related to primaries stepping down.

Comment by Siyuan Zhou [ 19/Jan/18 ]

Agree with Spencer on "heartbeatTimeoutPeriod". I think topology detection should be separated from the decision whether the primary should step down. For #21, should sync source resolver rely on heartbeat's parameters instead of election timeout? It seems a separated issue from consensus.

Comment by Andy Schwerin [ 19/Jan/18 ]

For #21, why do we care about liveness in this scenario?

Comment by Judah Schvimer [ 19/Jan/18 ]

Oh, I think I was looking at PV0 code.

Comment by Spencer Brody (Inactive) [ 19/Jan/18 ]

Hmm... I think it's fine to mark the node as down if you haven't heard from it in the heartbeat timeout. The problem is automatically stepping down when all nodes are down. We should only step down when a majority of nodes have been down for the election timeout.

Comment by Judah Schvimer [ 19/Jan/18 ]

Per conversation, we'll make the maximum 30 seconds.

Another problem will be heartbeat step downs. If we don't see heartbeats in the "heartbeatTimeoutPeriod" (not the heartbeat interval), then we'll set a node as down. If we receive a heartbeat and a majority of nodes are down, then we'll step down. This timeout is also 10 seconds by default. I think the only problem is here:
https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/topology_coordinator.cpp#L1097

I think this should maybe be the election timeout instead of the heartbeat timeout? spencer siyuan.zhou

Comment by Spencer Brody (Inactive) [ 19/Jan/18 ]

I think we probably also want to update #21, since it's used for liveness. I think both #21 and #1 could be set to the min of half the election timeout, or twice the heartbeat timeout.

I think we probably also want to put a cap on #26, maybe a minute.

Comment by Judah Schvimer [ 19/Jan/18 ]

There are 26 occurrences of "getElectionTimeoutPeriod()" by grep. Looking through all occurrences of "electiontimeout" case insensitive, I don't see any others that should be a problem:
1) https://github.com/mongodb/mongo/blob/0d8371f7e13b3455506f62d8e9129e4e66ed9a15/src/mongo/db/repl/oplog_fetcher.cpp#L74: This is used for PV1 getMore timeouts. getMores need to time out to get metadata. This needs to change. I recommend an upper bound of every 10 seconds, or some multiple (like 5) of the heartbeat timeout.
2) Testing
3) https://github.com/mongodb/mongo/blob/7f56cb7f21a4d13dd4c4d39d85f23ec77cd28f9b/src/mongo/db/repl/repl_set_config.cpp#L848: This is for priority takeover, so for elections.
4) Header
5-13) Testing
14) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L74: This is used to make election timeouts a bit random in normal and priority takeover elections. This is definitely okay being a fraction of the election timeout.
15) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L723: Used for scheduling liveness checks. This is used for scheduling elections so it should use the election timeout.
16) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L802: Used for scheduling elections
17) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L857: Logging
18-20) Testing
21) https://github.com/mongodb/mongo/blob/fcbce71b912723aac77ab1ec4efec8e15114a86c/src/mongo/db/repl/sync_source_feedback.cpp#L56: This is used for liveness checks on replSetUpdatePosition commands when progress isn't getting made. If there are no elections, this also shouldn't be a problem.
22) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/topology_coordinator.cpp#L1018: This already uses the heartbeat interval as an alternative, so it's fine.
23) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/topology_coordinator.cpp#L1021: Same as 22
24) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/topology_coordinator.cpp#L1177: Used for stepping down, exactly what we want.
25) https://github.com/mongodb/mongo/blob/f25cab34c54e87de7983f801cd3ee50395366ced/src/mongo/db/repl/topology_coordinator.cpp#L3080: Used for running elections on new terms. It's fine to use this timeout.
26) https://github.com/mongodb/mongo/blob/c246ae62641c3559c38830f6f5f4981e0acffa0c/src/mongo/db/repl/vote_requester.cpp#L90: This one uses it as a network timeout for vote requests. There should be no vote requests so that should be fine. It may be worth putting a maximum on this.

Generated at Thu Feb 08 04:31:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.