[SERVER-60699] Investigate more frequent occurences of PrimarySteppedDown errors during replica set shutdown in 5.1 Created: 14/Oct/21  Updated: 27/Oct/23  Resolved: 18/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.1.0-rc0
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jason Chan Assignee: Jason Chan
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-60673 Shutdown cmd on primary should ignore... Open
Sprint: Repl 2021-10-18, Repl 2021-11-01
Participants:

 Description   

When testing 5.1.0-rc0, users are experiencing frequent shutdown failures due to PrimarySteppedDown error. We should investigate to see if anything has changed in server between 5.1 and 5.0 regarding performance of shutdown or stepdown due to loss of majority.



 Comments   
Comment by Jason Chan [ 18/Oct/21 ]

Looking at the logs for 5.0 and 5.1, I don't see any difference in shutdown times. Quiesce mode maintains the same behavior across the two versions of quiescing for the entire timeoutSecs. The actual shutdown of the server after quiesce mode takes < 1 second across both versions. We believe that the observed longer shutdown times are because users were using arbiters in 5.0, but not in 5.1. Arbiters will skip quiesce mode entirely since we only ever quiesce as a secondary.

Comment by Jason Chan [ 15/Oct/21 ]

Quiesce mode will take a full minute because the timeoutSecs is set to 59 secs, and quiesce mode sets timeoutSecs as the quiesce duration

Comment by Judah Schvimer [ 15/Oct/21 ]

It seems from the logs that Quiesce Mode is taking a full minute. I'm curious what changed there in 5.1

Generated at Thu Feb 08 05:50:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.