[SERVER-40252] Signaling 1-node replica set to shut down now takes an extra 10 seconds Created: 21/Mar/19 Updated: 08/Jan/24 Resolved: 15/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Max Hirschhorn | Assignee: | Backlog - Service Architecture |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Service Arch 2019-03-25 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
It seems like attempting to run ReplicationCoordinator::stepDown() is unnecessary when the replica set configuration is known to only contain one node electable as primary. The extra time it takes to shut down the replica set is mildly annoying for certain aspects of my local development workflow.
|
| Comments |
| Comment by Mira Carey [ 15/Apr/19 ] | ||||||||
|
Closing this out after the change made in I think that satisfies the intent of this ticket | ||||||||
| Comment by Vesselina Ratcheva (Inactive) [ 25/Mar/19 ] | ||||||||
|
I think the fix Jason pointed out in the topology coordinator is the way to go implementation-wise (it can also be made in isSafeToStepDown), provided we come to a consensus about user-facing behavior. In the same spirit as the proposition | ||||||||
| Comment by Mira Carey [ 25/Mar/19 ] | ||||||||
|
After some reflection (and conversation with max.hirschhorn), I'm going to features we're not sure of this, for now. If we don't want to tackle allowing shutdown in more configurations, we should probably just make the timeout configurable (and make it 0 for most tests). I've opened | ||||||||
| Comment by Andy Schwerin [ 22/Mar/19 ] | ||||||||
|
Absolutely. My point is we shouldn't fix this regression by trading it for another user-facing behavior change without considering it. | ||||||||
| Comment by Danny Hatcher (Inactive) [ 22/Mar/19 ] | ||||||||
|
If | ||||||||
| Comment by Andy Schwerin [ 22/Mar/19 ] | ||||||||
|
I am reluctant to change the user-facing behavior of the stepDown and shutDown commands in this instance to make our tests run faster. I made a conscious decision to require the user to force shutdown whenever there is no other electable node. At the very least, we should let product weigh in. We might also have to update the documentation. | ||||||||
| Comment by Max Hirschhorn [ 21/Mar/19 ] | ||||||||
|
FWIW, I filed this ticket because of my use of 1-node replica sets locally, but I think the change should apply to any replica set where electableCount == 1. Stepping down a single voting replica set may still be useful for testing purposes, i.e. to have the primary actually transition to state SECONDARY, but to just skip the election handoff part. | ||||||||
| Comment by Mira Carey [ 21/Mar/19 ] | ||||||||
|
I think the fix here is to make repl coordinator stepDown, or topology coordinator attemptStepDown, return quickly if the configured set has 1 node. That would fix the slowness on sigterm, and make the shutdown command do something sane for 1 node repl sets. At a glance, I'd probably change https://github.com/mongodb/mongo/blob/2a4d8ed5bb64af081b887f17dabf298831866b1d/src/mongo/db/repl/topology_coordinator.cpp#L2237
so that there is an additional check for single node sets | ||||||||
| Comment by Judah Schvimer [ 21/Mar/19 ] | ||||||||
|
This feels pretty costly in terms of evergreen time spent. CC mira.carey@mongodb.com for any thoughts. | ||||||||
| Comment by Max Hirschhorn [ 21/Mar/19 ] | ||||||||
|
I would vote for changing the replSetStepDown command because you also cannot use the shutdown command without force=true to shut down a 1-node replica set. |