[SERVER-59226] Deadlock when stepping down with a profile session marked as uninterruptible Created: 10/Aug/21  Updated: 29/Oct/23  Resolved: 04/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.11, 4.2.18, 5.0.4, 5.1.0-rc0

Type: Bug Priority: Critical - P2
Reporter: Vishnu Kaushik Assignee: Wenbin Zhu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-63143 Operation can be interrupted by maxTi... Closed
Related
related to SERVER-60161 Deadlock between config server stepdo... Closed
related to SERVER-59673 Investigate better solutions for fixi... Closed
is related to SERVER-57756 Race between concurrent stepdowns and... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4, v4.2, v4.0
Sprint: Repl 2021-08-23, Repl 2021-09-06
Participants:
Case:

 Description   

When a primary is stepping down, when calling _stepDownFinish it kills all sessions through invalidateSessionsForStepDown -> killSessionsAction -> checkOutSessionForKill. This is after it has grabbed the RSTL via AutoGetRstlForStepUpStepDown.

In the process of stepping down, if it is trying to kill an already checked out session, there is a potential for deadlock, as it needs to wait until the session is checked back in.

The session will end up getting interrupted when it tries to grab a lock such as the GlobalLock. However, if the session's opCtx is marked as uninterruptible, then it is possible that the checked out session is waiting on the GlobalLock, while the step down thread (which has the RSTL) is waiting on the checked out session, causing a deadlock.

This is possible when profiling. In general, it may be possible with other uses of the UninterruptibleLockGuard.



 Comments   
Comment by Githook User [ 01/Nov/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: Revert "SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread."

This reverts commit d1a9e68d581f8e3162bc790bdb30986b8d030d86.
Branch: v4.0
https://github.com/mongodb/mongo/commit/8443b63b12e53d21590b1f7dcf02c5df26b9ff40

Comment by Githook User [ 01/Nov/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread.

(cherry picked from commit 43479818bd01f27ee25b6e992045529d2ac0185a)
Branch: v4.0
https://github.com/mongodb/mongo/commit/d1a9e68d581f8e3162bc790bdb30986b8d030d86

Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread.

(cherry picked from commit 43479818bd01f27ee25b6e992045529d2ac0185a)
Branch: v4.2
https://github.com/mongodb/mongo/commit/46d180e4f463f96b7d5f64c73e43204fd47f5fd4

Comment by Githook User [ 20/Oct/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread.

(cherry picked from commit 43479818bd01f27ee25b6e992045529d2ac0185a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/73ab1b397b1292dd791775e9f2f73fb0b3b4699b

Comment by Githook User [ 14/Oct/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread.
Branch: v5.0
https://github.com/mongodb/mongo/commit/ed83e49f6388e1c1260b610aff0c54fdd66389c9

Comment by Githook User [ 04/Sep/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59226 Fix deadlock between uninterruptible profiling operation and stepdown thread.
Branch: master
https://github.com/mongodb/mongo/commit/43479818bd01f27ee25b6e992045529d2ac0185a

Comment by Wenbin Zhu [ 30/Aug/21 ]

After discussing with execution team, we decided to push the fix for now, and later investigate the ideal solutions, such as removing RSTL for non-replicated write and profiling in a separate thread/executor. I have created SERVER-59673 to track the investigation.

Comment by Andy Schwerin [ 26/Aug/21 ]

OperationContext shouldn't have replication-specific member fields. It's a layer violation, as can be seen by the fact that there are no replication concepts in mongos.

Comment by Lingzhi Deng [ 26/Aug/21 ]

I think we used UninterruptibleLockGuard because we want to output the profiler entry to system.profile even if the opCtx is interrupted (e.g. the operation times out). system.profile is a non-replicate collection. And ideally, we don't actually need to acquire RSTL for non-replicate writes and writes on system.profile shouldn't get in the way of repl state transitions. But it might be difficult to skip RSTL in the write paths today. So wenbin.zhu's patch instead is to remove the UninterruptibleLockGuard but add a flag to opCtx to ignore interruptions except InterruptedDueToReplStateChange.

Comment by Andy Schwerin [ 26/Aug/21 ]

Why is the profiling code uninterruptible in the first place?

Comment by Wenbin Zhu [ 18/Aug/21 ]

And to answer josef.ahmad's question, I believe 4.0 is also impacted, we should be  able to have a 4.0 backport.

Comment by Wenbin Zhu [ 17/Aug/21 ]

I have successfully reproduced this issue in test. This only happens to retryable writes as mentioned in the previous comment. But note that only retryable updates and deletes are affected, not for inserts. This is because for inserts, the profile operation is called in `service_entry_point_common.cpp`, while for updates and deletes, the profile operation is called in `write_ops_exec.cpp`.  The difference is that for inserts, when profile() is called, the session has already been checked in, so it will not block the stepdown thread on checking out session even though the opCtx is marked uninterruptible, while for updates and deletes, the session is still in checked out state when profile() is being called, which would cause deadlocks with stepdown  thread. 

I'm wondering if updates and deletes can move the profile functionality after checking back in session, like insert does, as an optimization, not related to this ticket though.

Comment by Wenbin Zhu [ 16/Aug/21 ]

This seems to only happen with retryable writes, not with transactions, because in transactions, the opCtx's locker would have the maxLockTimeout set (the `maxTransactionLockRequestTimeoutMillis` server parameter and by default 5ms). And when profiling, if the maxLockTimeout is set, we don't make the opCtx uninterruptible. Will try to reproduce the issue. The proposed fix could be making the opCtx interruptible when maxLockTimeout is not set and retry if the operation is interrupted until success, but don't retry when maxLockTimeout is set to keep the same behavior as today.

Generated at Thu Feb 08 05:46:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.