[SERVER-60699] Investigate more frequent occurences of PrimarySteppedDown errors during replica set shutdown in 5.1 Created: 14/Oct/21 Updated: 27/Oct/23 Resolved: 18/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.1.0-rc0 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jason Chan | Assignee: | Jason Chan |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Repl 2021-10-18, Repl 2021-11-01 | ||||||||
| Participants: | |||||||||
| Description |
|
When testing 5.1.0-rc0, users are experiencing frequent shutdown failures due to PrimarySteppedDown error. We should investigate to see if anything has changed in server between 5.1 and 5.0 regarding performance of shutdown or stepdown due to loss of majority. |
| Comments |
| Comment by Jason Chan [ 18/Oct/21 ] |
|
Looking at the logs for 5.0 and 5.1, I don't see any difference in shutdown times. Quiesce mode maintains the same behavior across the two versions of quiescing for the entire timeoutSecs. The actual shutdown of the server after quiesce mode takes < 1 second across both versions. We believe that the observed longer shutdown times are because users were using arbiters in 5.0, but not in 5.1. Arbiters will skip quiesce mode entirely since we only ever quiesce as a secondary. |
| Comment by Jason Chan [ 15/Oct/21 ] |
|
Quiesce mode will take a full minute because the timeoutSecs is set to 59 secs, and quiesce mode sets timeoutSecs as the quiesce duration. |
| Comment by Judah Schvimer [ 15/Oct/21 ] |
|
It seems from the logs that Quiesce Mode is taking a full minute. I'm curious what changed there in 5.1 |