[SERVER-35058] Don't only rely on heartbeat to signal secondary positions in stepdown command Created: 17/May/18  Updated: 29/Oct/23  Resolved: 03/Jul/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.7, 4.0.2, 4.1.1

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-35623 Send a replSetStepUp command to an el... Closed
is related to SERVER-53612 StepDown hangs until timeout if all n... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6
Sprint: Repl 2018-06-04, Repl 2018-06-18, Repl 2018-07-02
Participants:

 Description   

replSetStepDown command waits for a majority of nodes to catch up and one of them to be an eligible candidate, but such event is only signaled when processing heartbeat responses, which adds more delay to the handoff.

The easiest and less efficient fix is to signal the condition variable whenever we update the last applied optime. The better solution is to replace the conditional variable with a waiter in _replicationWaiterList as in _awaitReplication_inlock(). A third solution is to call _awaitReplication_inlock(), which might not be desired since the condition stepdown command is waiting on is slightly different than w: majority + an eligible candidate specified in config.



 Comments   
Comment by Githook User [ 04/Apr/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-35058 Defer removing ReplicationCoordinator ThreadWaiters to their WaiterGuards

(cherry picked from commit 9df6cbae9c20bdce759deb806e5175b7fc83d007)
Branch: v3.6
https://github.com/mongodb/mongo/commit/0395f6fae741624906529958d091c8762e72b594

Comment by Githook User [ 09/Aug/18 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-35058 Do not rely only on heartbeats to signal secondary positions in the stepdown command

(cherry picked from commit 925a113194e00e193318486f576d14e6c3e27ea1)
Branch: v3.6
https://github.com/mongodb/mongo/commit/5ce451fae4e097bb5673b73f5ff6c070e25d5d62

Comment by Githook User [ 06/Aug/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35058 Do not rely only on heartbeats to signal secondary positions in the stepdown command

(cherry picked from commit 925a113194e00e193318486f576d14e6c3e27ea1)
Branch: v4.0
https://github.com/mongodb/mongo/commit/19f0e23c88ef63aa554895c04d4318f27ce73559

Comment by Githook User [ 06/Aug/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35058 Defer removing ReplicationCoordinator ThreadWaiters to their WaiterGuards

(cherry picked from commit 9df6cbae9c20bdce759deb806e5175b7fc83d007)
Branch: v4.0
https://github.com/mongodb/mongo/commit/fe1b92cee5c133e82845ffbd31b25ab5b66084d3

Comment by Githook User [ 03/Jul/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35058 Do not rely only on heartbeats to signal secondary positions in the stepdown command
Branch: master
https://github.com/mongodb/mongo/commit/925a113194e00e193318486f576d14e6c3e27ea1

Comment by Githook User [ 21/Jun/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35058 Defer removing ReplicationCoordinator ThreadWaiters to their WaiterGuards
Branch: master
https://github.com/mongodb/mongo/commit/9df6cbae9c20bdce759deb806e5175b7fc83d007

Generated at Thu Feb 08 04:38:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.