[SERVER-65184] Avoid concurrent election and stepdown in downgrade_default_write_concern_majority.js Created: 01/Apr/22  Updated: 29/Oct/23  Resolved: 21/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.9

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2022-04-04, Repl 2022-04-18, Repl 2022-05-02
Participants:
Linked BF Score: 44

 Description   

As part of downgrading the cluster, we stop the config server mongod. Part of the process includes stepping down the node before collection validation. However, there is a concurrent election that happens during stepdown. This causes the killOp thread to kill the stepDown with InterrupedDueToReplStateChange:

[js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.425+00:00 I ELECTION 21450 [ReplCoord-9] "Election succeeded, assuming primary role","attr":
 
{"term":2}
 
[js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.425+00:00 I REPL 21358 [ReplCoord-9] "Replica set state transition","attr":
 
{"newState":"PRIMARY","oldState":"SECONDARY"}
 
...
 [js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.456+00:00 I COMMAND 21579 [conn94] "Attempting to step down in response to replSetStepDown command"
 ...
 [js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.487+00:00 I REPL 21343 [RstlKillOpThread] "Starting to kill user operations"
 [js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.490+00:00 I REPL 21344 [RstlKillOpThread] "Stopped killing user operations"
 [js_test:downgrade_default_write_concern_majority] c20781| 2022-03-28T15:18:37.490+00:00 I REPL 21340 [RstlKillOpThread] "State transition ops metrics","attr":\\{"metrics":{"lastStateTransition":"stepUp","userOpsKilled":1,"userOpsRunning":4}}

One way to fix this is to either set the cluster secondaries to votes: 0 since we don't expect to test election behavior in this test. An alternative is to add InterruptedDueToReplStateChange https://github.com/10gen/mongo/blob/1cc143da4077560d714d99471b8006c0dec5f66a/jstests/libs/override_methods/validate_collections_on_shutdown.js#L87 of validate_collections_in_stepdown.js



 Comments   
Comment by Githook User [ 21/Apr/22 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-65184 Avoid concurrent election and stepdown in downgrade_default_write_concern_majority.js
Branch: v5.0
https://github.com/mongodb/mongo/commit/5c94c99848074252570baa36a0d5b21a7493143c

Generated at Thu Feb 08 06:02:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.