[SERVER-48928] Allow primary-elect to complete drain mode even if it is stepping down unconditionally Created: 17/Jun/20 Updated: 29/Oct/23 Resolved: 28/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.2.10, 4.4.2, 4.0.21 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Xuerui Fa |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4, v4.2, v4.0
|
||||||||||||
| Sprint: | Repl 2020-07-27, Repl 2020-08-10 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We call ReplicaSetAwareServiceRegistry::onStepUpComplete and ReplicationCoordinatorExternalState::onTransitionToPrimary before calling _topCoord->completeTransitionToPrimary, which can fail. If it does fail, we don't seem to take any action to inform the external services we just told we were now primary that we actually no longer are.
To resolve this, we will allow nodes to complete drain mode, even if their action is set to kSteppingDown. This will let us invariant on the second check, instead of returning a Status. |
| Comments |
| Comment by Xuerui Fa [ 19/Aug/20 ] |
|
I think I put the wrong ticket number in my local branch when I pushed this ticket. Here's the git commit hook comment: Author: {'name': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com', 'username': 'XueruiFa'}Message: |
| Comment by Siyuan Zhou [ 17/Jul/20 ] |
|
Great! Since the state transition and the hooks are executed without releasing RSTL, the proposal of ignoring kSteppingDown sounds good to me. |
| Comment by Xuerui Fa [ 14/Jul/20 ] |
|
Got it, thanks for the clarification. I searched for when the post member state update action might be set to kActionSteppedDown. It seems like this would only happen if the original state was PRIMARY and the new state is not PRIMARY, REMOVED or ROLLBACK. Looking at the usages of _updateMemberStateFromTopologyCoordinator, it seems the action could only be set to kActionSteppedDown if we are in a step down code path. In these cases, RSTL is obtained. Let me know if I missed anything in that investigation! |
| Comment by Siyuan Zhou [ 14/Jul/20 ] |
|
Yep! That's what I'm looking for. We need to make sure when onStepUpComplete is called, onStepDown must be called exactly once afterwards if the node steps down. |
| Comment by Spencer Brody (Inactive) [ 14/Jul/20 ] |
|
Nope, I assume Siyuan is referring to this? |
| Comment by Xuerui Fa [ 14/Jul/20 ] |
|
Thanks for the responses! I did a search for usages of onStepDown hooks and I couldn't find any usages. spencer, is this being implemented as part of Primary-Only Service? |
| Comment by Siyuan Zhou [ 14/Jul/20 ] |
|
It seems we are facing the same problem spencer made a great point. If kSteppingDown is allowed in canCompleteTransitionToPrimary, the server should never allow writing. I believe that's already true, since _leaderMode controls writability. For Spencer's last concern, xuerui.fa, could you please double check we only call ReplicaSetAwareServiceRegistry's onStepDown hooks with RSTL? |
| Comment by Spencer Brody (Inactive) [ 14/Jul/20 ] |
|
Haha, of course it was my fault! Why am I not surprised. I wonder if there's a way we could "complete" the transition to primary, but without actually allowing writes in that we know are going to be rolled back? Potentially by holding onto the global lock, or by never actually setting isMaster to true. Could the stepUp complete but then immediately hand off to stepdown, without releasing locks? Whatever we do here, I think it's important that once we notify external services that we are primary, if we then step down or never actually make it to fully functional primary, we need to inform those external services of the change. |
| Comment by Xuerui Fa [ 14/Jul/20 ] |
|
|
| Comment by Xuerui Fa [ 14/Jul/20 ] |
|
It seems like the check that might cause signalDrainComplete to fail after notifying external services is called twice within signalDrainComplete: first here and a second time when we call _topCoord->completeTransitionToPrimary. siyuan.zhou and I spent some time today exploring what might cause the check to pass the first time, but fail on the second call. The goal was to see if we could rely only on the first check, so that we could safely notify external services and then call completeTransitionToPrimary without the check. In many instances, the leader mode is modified only after obtaining RSTL. In signalDrainComplete, we acquire the RSTL lock before we do the first check. As a result, in most instances, we are confident that the leader mode has not changed since the first check, and so we can remove the second check. There are two exceptions to this, and both have to do with unconditional step down (here is one case. We modify leader mode in prepareForUnconditionalStepDown). In both cases, the leader mode is set to kSteppingDown before acquiring RSTL. Our current idea to resolve this is to modify the check so that it will allow the leader mode to be set to kSteppingDown, in addition to kLeaderElect and kAttemptingStepDown. This is because it is not possible for the rest of the unconditional step down code path to proceed, since we are holding RSTL. As a result, signalDrainComplete will finish with no problems, and only after releasing RSTL will the unconditional step down code path be allowed to continue. We will investigate more tomorrow to see if there are any issues with this solution. |
| Comment by Siyuan Zhou [ 01/Jul/20 ] |
|
Marked to schedule on next triage meeting. Sorry I missed that "Investigating" probably didn't show up on the triage meeting. |