[SERVER-40487] Stop running the RstlKillOpthread when a node is no longer primary Created: 04/Apr/19 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||
| Sprint: | Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
Currently, when 2 concurrent step downs are triggered (can be a combination of conditional step down and unconditional step down or 2 conditional step downs), there is a possibility that the step down thread can kill the transaction operations processed by the second oplog application. Consider the below scenario and assume that node A is in primary state. |
| Comments |
| Comment by Judah Schvimer [ 13/Jun/19 ] |
|
We will re-evaluate this after PM-1455. Jason Carey has a plan to improve how operations choose to run only as a primary. We should sync up with him before starting this work. |
| Comment by Judah Schvimer [ 22/May/19 ] |
|
This ticket will be for the optimization to kill fewer readers. |
| Comment by Judah Schvimer [ 21/May/19 ] |
I have done this check and don't see anything further to do besides commit the test Suganthi wrote. |
| Comment by Judah Schvimer [ 20/May/19 ] |
We don't hit some of the problem, but not necessarily all of it. The concern with killing secondary oplog application is still there in 4.0 since it was not transaction specific. I think we're safe because both in 4.2 and 4.0 we only kill user operations and secondary oplog application is not a user operation so it won't be killed. |
| Comment by Judah Schvimer [ 16/May/19 ] |
It is possible, however when we try to abort the transaction we will check out the session. We invariant here that the transaction is in a "prepared" state before it is checked back in during secondary oplog application. When we try to abort a prepared transaction during stepdown we will ignore the attempt and leave the transaction in prepare. This was all investigated thoroughly in |
| Comment by Judah Schvimer [ 16/May/19 ] |
|
We have to check if a secondary could see kInProgress for a transaction here. We do check if we're secondary before we yield locks for prepared transactions. We will also double check in the deadlock fix design that there were no other deadlocks that we were concerned about. |
| Comment by Suganthi Mani [ 16/May/19 ] |
|
Spoke to judah.schvimer, we won't be doing the optimization now. But, we would be adding a jstest for this scenario. |
| Comment by Suganthi Mani [ 15/May/19 ] |
|
Currently a step down thread can wait for RSTL lock when the node has stepped down and transitioned to secondary state. Consider the below concurrent step down code paths. 1) Unconditional step down & Conditional step down (step down cmd). 2) Force reconfig & step down via heartbeat. 3) step down via heartbeat & Force reconfig. In above 3 cases, the former(green) wins and pushing the later ones(red) to enqueue the RSTL, starting the killop thread and waiting for RSTL lock when the node is in secondary state. As per the new approach ( Next question, is do we need to do optimization of not killing the read operations when they are no longer primary. Currently, I have 2 solutions and both of them have some flaws/complications. 1) (WIP) Listener model - where step down registers the listener and the winner should interrupt all the listeners. When they interrupt the listeners, they have to kill the listener's killopthread and mark the listener thread as killed which means this solution has 2 problems - It can kill the stepdown via hb (case 2 - internal operation) and reconfig cmd (case 3) . And, that's not correct. - For case 1, since we can take RSTL lock in an uninterruptible lock guard, we can't make the waiting for lock acquisition to be interrupted and there is a possibility of 3 way deadlock. 2) After every iteration, check if the member state is primary in the killOpthread. If not, break the loop (ie) stop the killop thread. Now comes the complication of stopping the step down thread from waiting for RSTL lock as ReplicationStateTransitionLockGuard is not thread safe. So, the fix is not going to be straightforward. With the complications, is it worth implementing the optimization?. To be noted, we kill the read operations with the retryable error code ErrorCodes.InterruptedDueToStepDown. Since this killing read operations on secondary is only for a brief window, I feel its ok to close this ticket or decrease the priority of this ticket. Note: In 4.0, we won't get into this problem as we don't support cross-shard transaction. |
| Comment by Suganthi Mani [ 07/May/19 ] |
|
Step down thread is no longer going to take RSTL in S mode but in X mode. So, the problem described in this ticket is still valid. |
| Comment by Suganthi Mani [ 24/Apr/19 ] |
|
After, |
| Comment by Suganthi Mani [ 24/Apr/19 ] |
|
Currently, we see 2 problems over here.
Problem No:2 Now, if the secondary oplog application aren't killed. Then it can lead to 3-way deadlock.
To solve it, we need to implement a way such that the stepdown thread stops running the RstlKillOpthread and waiting for RSTL lock when the node is no longer in PRIMARY state. |