[SERVER-48928] Allow primary-elect to complete drain mode even if it is stepping down unconditionally Created: 17/Jun/20  Updated: 29/Oct/23  Resolved: 28/Jul/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0, 4.2.10, 4.4.2, 4.0.21

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Xuerui Fa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-31572 Update faulty invariant in TopologyCo... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Repl 2020-07-27, Repl 2020-08-10
Participants:

 Description   

We call ReplicaSetAwareServiceRegistry::onStepUpComplete and ReplicationCoordinatorExternalState::onTransitionToPrimary before calling _topCoord->completeTransitionToPrimary, which can fail.

If it does fail, we don't seem to take any action to inform the external services we just told we were now primary that we actually no longer are.

 

To resolve this, we will allow nodes to complete drain mode, even if their action is set to kSteppingDown. This will let us invariant on the second check, instead of returning a Status.



 Comments   
Comment by Xuerui Fa [ 19/Aug/20 ]

I think I put the wrong ticket number in my local branch when I pushed this ticket. Here's the git commit hook comment:

Author:

{'name': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com', 'username': 'XueruiFa'}

Message: SERVER-48938: Allow primary-elect to complete drain mode even if it is stepping down unconditionally
Branch: master
https://github.com/mongodb/mongo/commit/f11b0351c33d2888607eebd1748d524d241fc9ba

Comment by Siyuan Zhou [ 17/Jul/20 ]

Great! Since the state transition and the hooks are executed without releasing RSTL, the proposal of ignoring kSteppingDown sounds good to me.

Comment by Xuerui Fa [ 14/Jul/20 ]

Got it, thanks for the clarification. I searched for when the post member state update action might be set to kActionSteppedDown. It seems like this would only happen if the original state was PRIMARY and the new state is not PRIMARY, REMOVED or ROLLBACK. Looking at the usages of _updateMemberStateFromTopologyCoordinator, it seems the action could only be set to kActionSteppedDown if we are in a step down code path. In these cases, RSTL is obtained.

Let me know if I missed anything in that investigation!

Comment by Siyuan Zhou [ 14/Jul/20 ]

Yep! That's what I'm looking for. We need to make sure when onStepUpComplete is called, onStepDown must be called exactly once afterwards if the node steps down.

Comment by Spencer Brody (Inactive) [ 14/Jul/20 ]

Nope, I assume Siyuan is referring to this?

Comment by Xuerui Fa [ 14/Jul/20 ]

Thanks for the responses! I did a search for usages of onStepDown hooks and I couldn't find any usages. spencer, is this being implemented as part of Primary-Only Service? 

Comment by Siyuan Zhou [ 14/Jul/20 ]

It seems we are facing the same problem SERVER-31572. Thanks for digging into that xuerui.fa!

spencer made a great point. If kSteppingDown is allowed in canCompleteTransitionToPrimary, the server should never allow writing. I believe that's already true, since _leaderMode controls writability.

For Spencer's last concern, xuerui.fa, could you please double check we only call ReplicaSetAwareServiceRegistry's  onStepDown hooks with RSTL?

Comment by Spencer Brody (Inactive) [ 14/Jul/20 ]

Haha, of course it was my fault!  Why am I not surprised.

I wonder if there's a way we could "complete" the transition to primary, but without actually allowing writes in that we know are going to be rolled back?  Potentially by holding onto the global lock, or by never actually setting isMaster to true.  Could the stepUp complete but then immediately hand off to stepdown, without releasing locks?

Whatever we do here, I think it's important that once we notify external services that we are primary, if we then step down or never actually make it to fully functional primary, we need to inform those external services of the change.

Comment by Xuerui Fa [ 14/Jul/20 ]

SERVER-31572 seems to be related to this ticket. Previously, instead of returning an error code, the canCompleteTransitionToPrimary check would invariant that the check succeeds. If the node initiates unconditional step down while it is in signalDrainComplete, the invariant would have failed. In that ticket, it was proposed that we allow signalDrainComplete to finish and then step down unconditionally, or fail the transition to primary. It seems like the decision was made to fail the transition to primary, which is why this second check was added in. According to this comment, the first proposal was chosen so that there wouldn't be a brief period of time where the primary could accept writes, even though it would step down.

Comment by Xuerui Fa [ 14/Jul/20 ]

It seems like the check that might cause signalDrainComplete to fail after notifying external services is called twice within signalDrainComplete: first here and a second time when we call _topCoord->completeTransitionToPrimary. siyuan.zhou and I spent some time today exploring what might cause the check to pass the first time, but fail on the second call. The goal was to see if we could rely only on the first check, so that we could safely notify external services and then call completeTransitionToPrimary without the check.

In many instances, the leader mode is modified only after obtaining RSTL. In signalDrainComplete, we acquire the RSTL lock before we do the first check. As a result, in most instances, we are confident that the leader mode has not changed since the first check, and so we can remove the second check.

There are two exceptions to this, and both have to do with unconditional step down (here is one case. We modify leader mode in prepareForUnconditionalStepDown). In both cases, the leader mode is set to kSteppingDown before acquiring RSTL. Our current idea to resolve this is to modify the check so that it will allow the leader mode to be set to kSteppingDown, in addition to kLeaderElect and kAttemptingStepDown. This is because it is not possible for the rest of the unconditional step down code path to proceed, since we are holding RSTL. As a result, signalDrainComplete will finish with no problems, and only after releasing RSTL will the unconditional step down code path be allowed to continue. We will investigate more tomorrow to see if there are any issues with this solution.

Comment by Siyuan Zhou [ 01/Jul/20 ]

Marked to schedule on next triage meeting. Sorry I missed that "Investigating" probably didn't show up on the triage meeting.

Generated at Thu Feb 08 05:18:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.