[SERVER-40487] Stop running the RstlKillOpthread when a node is no longer primary Created: 04/Apr/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-37574 Force reconfig should kill user opera... Closed
Issue split
split to SERVER-41283 Add test that running stepdown on sec... Closed
Related
related to SERVER-40700 Deadlock between read prepare conflic... Closed
is related to SERVER-37348 TransactionReaper and periodic transa... Closed
Assigned Teams:
Replication
Sprint: Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20
Participants:

 Description   

Currently, when 2 concurrent step downs are triggered (can be a combination of conditional step down and unconditional step down or 2 conditional step downs), there is a possibility that the step down thread can kill the transaction operations processed by the second oplog application.

Consider the below scenario and assume that node A is in primary state.
1) User executes replSetStepDown cmd (Thread X).
2) Thread X is at this line.
3) Now, node A notices that a new term has begun via heartbeat. So, node A  steps down via unconditional stepdown code path.
4) Now the state of node A will be SECONDARY.
5) Node A's oplog application tries to apply the prepare/commit oplog entry. This would require the secondary oplog application to checkout the session. Let assume, oplog application thread Y, tries to apply commit oplog entry and is at this line.
6) Read operations comes in (Thread Z), acquired the RSTL lock in mode IX  and global lock in IS mode. And, its blocked by  thread Y due to prepare conflict ( conflict at the document lock).
7) Thread X resumes and enqueues the RSTL lock in X mode as it is blocked by read thread (thread Z).
8) Thread X starts "RstlKillOpthread". Now, RstlKillOpthread marks the thread Y(belongs to secondary oplog application) as killed as part of killSessionsAbortUnpreparedTransactions.



 Comments   
Comment by Judah Schvimer [ 13/Jun/19 ]

We will re-evaluate this after PM-1455. Jason Carey has a plan to improve how operations choose to run only as a primary. We should sync up with him before starting this work.

Comment by Judah Schvimer [ 22/May/19 ]

This ticket will be for the optimization to kill fewer readers.

Comment by Judah Schvimer [ 21/May/19 ]

We will also double check in the deadlock fix design that there were no other deadlocks that we were concerned about.

I have done this check and don't see anything further to do besides commit the test Suganthi wrote.

Comment by Judah Schvimer [ 20/May/19 ]

In 4.0, we won't get into this problem as we don't support cross-shard transaction.

We don't hit some of the problem, but not necessarily all of it. The concern with killing secondary oplog application is still there in 4.0 since it was not transaction specific. I think we're safe because both in 4.2 and 4.0 we only kill user operations and secondary oplog application is not a user operation so it won't be killed.

Comment by Judah Schvimer [ 16/May/19 ]

We have to check if a secondary could see kInProgress for a transaction here.

It is possible, however when we try to abort the transaction we will check out the session. We invariant here that the transaction is in a "prepared" state before it is checked back in during secondary oplog application. When we try to abort a prepared transaction during stepdown we will ignore the attempt and leave the transaction in prepare. This was all investigated thoroughly in SERVER-37348 for the TransactionReaper and abort threads.

Comment by Judah Schvimer [ 16/May/19 ]

We have to check if a secondary could see kInProgress for a transaction here.

We do check if we're secondary before we yield locks for prepared transactions.

We will also double check in the deadlock fix design that there were no other deadlocks that we were concerned about.

Comment by Suganthi Mani [ 16/May/19 ]

Spoke to judah.schvimer, we won't be doing the optimization now. But, we would be adding a jstest for this scenario.

Comment by Suganthi Mani [ 15/May/19 ]

judah.schvimer,

Currently a step down thread can wait for RSTL lock when the node has stepped down and transitioned to secondary state. Consider the below concurrent step down code paths.

1) Unconditional step downConditional step down (step down cmd).

2) Force reconfig  & step down via heartbeat.

3) step down via heartbeat &  Force reconfig.

In above 3 cases, the former(green) wins and pushing the later ones(red) to enqueue the RSTL, starting the killop thread and waiting for RSTL lock when the node is in secondary state.

As per the new approach (SERVER-40700), even though our step downs take RSTL lock in X mode, we kill read operations blocked on prepared txn (secondary oplog application). This means we won't get 3 way deadlock (problem No:2).

Next question, is do we need to do optimization of not killing the read operations when they are no longer primary. Currently, I have 2 solutions and both of them have some flaws/complications.

1) (WIP) Listener model - where step down registers the listener and the winner  should interrupt all the listeners. When they interrupt the listeners, they have to kill the listener's killopthread and mark the listener thread as killed which means this solution has 2 problems

      - It can kill the stepdown via hb (case 2 - internal operation) and reconfig cmd (case 3) . And, that's not correct.

      -  For case 1,  since we can take RSTL lock in an uninterruptible lock guard, we can't make the waiting for lock acquisition to be interrupted and there is a possibility of 3 way deadlock.

2) After every iteration, check if the member state is primary in the killOpthread. If not, break the loop (ie) stop the killop thread. Now comes the complication of stopping the step down thread from waiting for RSTL lock as  ReplicationStateTransitionLockGuard is not  thread safe. So, the fix is not going to be straightforward.

With the complications, is it worth implementing the optimization?.  To be noted, we kill the read operations with the retryable error code ErrorCodes.InterruptedDueToStepDown. Since this killing read operations on secondary is only for a brief window, I feel its ok to close this ticket or decrease the priority of this ticket.

Note: In 4.0, we won't get into this problem as we don't support cross-shard transaction.

Comment by Suganthi Mani [ 07/May/19 ]

Step down thread is no longer going to take RSTL in S mode but in X mode. So, the problem described in this ticket is still valid.

Comment by Suganthi Mani [ 24/Apr/19 ]

After, SERVER-40700, problem No:2 (3-way deadlock) is also not possible, as the step down thread is going to take the lock in S mode which should not conflict with the read thread. So, there is nothing to do for this ticket. So, marking this ticket as depends on SERVER-40700.

Comment by Suganthi Mani [ 24/Apr/19 ]

Currently, we see 2 problems over here.
Problem No:1 Operations applied during secondary oplog application can be marked killed by step down.

Problem No:2 Now, if the secondary oplog application aren't killed. Then it can lead to 3-way deadlock.

  • Read thread (thread Z) blocked by prepared transaction thread Y (secondary oplog application).
  • Step down (thread X) blocked by read thread.
  • prepared transaction blocked by step down thread.

To solve it, we need to implement a way such that the stepdown thread stops running the RstlKillOpthread and waiting for RSTL lock when the node is no longer in PRIMARY state.

Generated at Thu Feb 08 04:55:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.