[SERVER-41355] Step down should call yieldLocksForPreparedTransactions w/o holding repl mutex lock (ReplicationCoordinatorImpl::_mutex). Created: 29/May/19  Updated: 29/Oct/23  Resolved: 15/Jul/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-41317 Push commitTransaction's check for a ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Repl 2019-06-03, Repl 2019-06-17, Repl 2019-07-01, Repl 2019-07-15, Repl 2019-07-29
Participants:
Linked BF Score: 10

 Description   

Currently, step down calls yieldLocksForPreparedTransactions by holding both RSTL and repl mutex lock. As a result, this can deadlock with prepared txn threads that have checked out the session. Consider the below case. 

1) Thread A (txn cmd) has checked out the session.
2) Step down has acquired RSTL lock and repl mutex lock.
3) Step down calls yieldLocksForPreparedTransactions which marks the thread A as killed as it has checked out the session and its transaction state is TransactionState::kPrepared
4) Thread A tries to acquire repl mutex lock which is held by step down thread.
5) Step down waits for thread A to check in the session, so that it can check out the session and perform lock yielding of that prepared txn. But, thread A can't check in the session as it waiting for the repl mutex lock which is not interruptible.

SERVER-41317 describes a problem happened due to above scenario.



 Comments   
Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-41355 Step down calls yieldLocksForPreparedTransactions without
holding repl mutex.

(cherry picked from commit cc1a75e4a6d8de8478e7253da7bd6376052d57a6)
Branch: v4.2
https://github.com/mongodb/mongo/commit/79bd2579429eb6827e2e87e0321d6cad32395c6a

Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-41355 Step down calls yieldLocksForPreparedTransactions without
holding repl mutex.
Branch: master
https://github.com/mongodb/mongo/commit/cc1a75e4a6d8de8478e7253da7bd6376052d57a6

Comment by Suganthi Mani [ 18/Jun/19 ]

Investigation:
Currently, step down cmd (conditional step) down should not release the repl mutex lock before calling yieldLocksForPreparedTransactions(). Else, concurrent step ups or step downs can happen which can lead to server crash.

For step down cmd, we call yieldLocksForPreparedTransactions() only after stepping down (i.e. TopologyCoordinator::_role is set to Role::kFollower and TopologyCoordinator::_leaderMode is set to LeaderMode::kNotLeader) but the member state in the replicationCoordinator is not yet updated.

Concurrent step up: 

  • If we release the repl mutex lock after stepping down and before yieldLocksForPreparedTransactions(), a concurrent step up can start an election because the role of the node is no longer a leader or candidate. Even though, the step up gets struck before starting a real election (as it needs to persist the last vote info in disk which requires RSTL in IX mode but gets blocked behind step down which holds RSTL lock in X mode), the server crash can happen. Since, to start an election the step up changes the TopologyCoordinator::_role to kCandidate and now if _handleHeartbeatResponse() code path notices that the member state maintained by replicationCoordinator and topologyCoordinator are not same, it will try to update the replicationCoordinator's member state which can lead to this invariant failure.

Concurrent step down:

  • Since, the member state of replicationCoordinator is still primary, step down triggered by force reconfig or heatbeat reconfig can occur and can lead to an unsafe/invalid state where the node's role reflects kFollower but the leaderMode in kSteppingDown.

To be noted, releasing repl mutex lock before calling yieldLocksForPreparedTransactions() is not a problem for unconditional step down code paths, as we haven't stepped down (i.e. _role or _leader value of the topologyCoordinator are not yet changed)before yieldLocksForPreparedTransactions(). So, no concurrent step ups or step downs can happen.

 Solution:
In order to make step down code paths to call yieldLocksForPreparedTransactions() method w/o holding repl mutex lock, the step down cmd (conditional step down) should perform below sequence.

1) TopologyCoordinator::attemptStepDown should not perform step down (i.e. should not change the role or leaderMode), instead it will upgrade its status from conditional step down to unconditional step down (i.e the leaderMode will gets transitioned from kAttemptingStepDown to kSteppingDown). By doing this, we prevent any concurrent step ups and step downs. Also, we guarantee that the step down cmd won't fail after this point and safe to release the repl mutex lock w/ RSTL lock held.
2) Release the repl Mutex lock
3) Call yieldLocksForPreparedTransactions ().
4) Reacquire the repl mutex lock.
5) Now perform step down (i.e. call finishUnconditionalStepDown()).

Generated at Thu Feb 08 04:57:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.