[SERVER-36817] replSetFreeze command run by stepdown thread may fail when server is already primary Created: 23/Aug/18  Updated: 29/Oct/23  Resolved: 27/Aug/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.6.10, 4.0.6, 4.1.3

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Jonathan Abrahams
Resolution: Fixed Votes: 0
Labels: tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
is caused by SERVER-35383 Increase electionTimeoutMillis for th... Closed
Related
related to SERVER-36868 Update error code in the stepdown hook. Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: TIG 2018-09-10
Participants:
Linked BF Score: 0
Story Points: 1

 Description   

As part of the changes to address SERVER-35383 and based on this comment from SERVER-35124, the stepdown thread in resmoke.py runs the {replSetFreeze: 0} command to make the former primary electable in the next round of stepdowns. Since the primary is only stepped down for 10 seconds (by default), it is possible for enough time to have passed for the primary to have tried to step back up on its own before the {replSetFreeze: 0} command is run.

We either need to handle the OperationFailure: cannot freeze node when primary or running for election. state: Primary exception or prevent it from occurring.



 Comments   
Comment by Githook User [ 22/Dec/18 ]

Author:

{'username': 'hptabster', 'email': 'jonathan@mongodb.com', 'name': 'Jonathan Abrahams'}

Message: SERVER-36817 replSetFreeze command run by stepdown thread may fail when server is already primary

(cherry picked from commit 0c0a4acea4a1c7bb579f5aaaa89a6f1545cf22ef)
Branch: v3.6
https://github.com/mongodb/mongo/commit/f3e59b921a82be829c5adee055b9875232adfe95

Comment by Githook User [ 22/Dec/18 ]

Author:

{'username': 'hptabster', 'email': 'jonathan@mongodb.com', 'name': 'Jonathan Abrahams'}

Message: SERVER-36817 replSetFreeze command run by stepdown thread may fail when server is already primary

(cherry picked from commit 0c0a4acea4a1c7bb579f5aaaa89a6f1545cf22ef)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ccbe71910133c84645496ce360f6b65564d415fa

Comment by Githook User [ 27/Aug/18 ]

Author:

{'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com', 'username': 'hptabster'}

Message: SERVER-36817 replSetFreeze command run by stepdown thread may fail when server is already primary
Branch: master
https://github.com/mongodb/mongo/commit/0c0a4acea4a1c7bb579f5aaaa89a6f1545cf22ef

Comment by Max Hirschhorn [ 23/Aug/18 ]

We either need to handle the OperationFailure: cannot freeze node when primary or running for election. state: Primary exception or prevent it from occurring.

Allowing a node to step back up on its own violates the principle of ensuring the stepdown thread is in complete control over which node is primary at any moment. I'd be in favor of removing the stepdown_duration_secs configuration option and instead having it always be a very long time (e.g. 24 hours) so that the stepdown thread must run the replSetStepUp command for a node to ever become primary. CC judah.schvimer

If we go down this found, then I think leave cleaning up the exception handling for the replSetStepUp command to SERVER-36451.

Generated at Thu Feb 08 04:44:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.