[SERVER-39777] step down nodes with a high freeze timeout before validating them on shutdown Created: 22/Feb/19  Updated: 29/Oct/23  Resolved: 03/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.1.10, 4.0.13

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-40290 Race in last_vote.js Closed
Related
related to SERVER-42747 validate_collections_on_shutdown.js s... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2019-03-11, Repl 2019-03-25, Repl 2019-04-08
Participants:
Linked BF Score: 23

 Description   

If a node is able to step down mid-validation, then it will fail the validation. Freezing secondary nodes and stepping down primary nodes with a high freeze timeout should fix this.



 Comments   
Comment by Githook User [ 13/Aug/19 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-39777 ensure nodes cannot step down during shutdown validation

(cherry picked from commit ab99966275dce28a052446be4c70a500956f507b)
Branch: v4.0
https://github.com/mongodb/mongo/commit/e8b17d8be314671f303df2dab22c3402e4a01c47

Comment by Githook User [ 03/Apr/19 ]

Author:

{'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-39777 ensure nodes cannot step down during shutdown validation
Branch: master
https://github.com/mongodb/mongo/commit/ab99966275dce28a052446be4c70a500956f507b

Comment by Max Hirschhorn [ 26/Feb/19 ]

FWIW, validate_collections_on_shutdown.js intended to use the network retry logic in command_sequence_with_retries.js to tolerate stepdowns by simply retrying the listDatabases or validate commands. Forcing the stepdown to happen sooner (and not letting it step back up) sounds like a more reliable approach than trying to inspect the raw response to the validate command and seeing if it failed with a InterruptedDueToStepDown error response to retry it.

Generated at Thu Feb 08 04:53:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.