-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Replication
-
Fully Compatible
-
v7.1, v7.0, v6.0, v5.0, v4.4
-
Repl 2023-03-06, Repl 2023-03-20, Repl 2023-05-01, Repl 2023-05-15, Repl 2023-05-29, Repl 2023-06-12, Repl 2023-06-26, Repl 2023-07-24, Repl 2023-08-07, Repl 2023-08-21, Repl 2023-09-04, Repl 2023-09-18
SERVER-56756 added an fassert to crash the server when it times out on acquiring the RSTL lock on stepUp/stepDown. We currently dump all locks before the fassert. But sometimes, the lock manager dump isn't sufficient for diagnosing the underlying issues. Most of the time, a core dump is needed to understand what are all of the current running ops and what are they doing. Ideally, it'd be helpful if we can just dump the stacktraces (printAllThreadStacks) but that's not always feasible especially on production builds. One alternative way to do this is to selectively dump currentOp (and maybe the session catalog as well).
SERVER-71521 is an improvement of currentOp that may help with this.
Update: see conversation, we decided to dump all thread stacks.
- depends on
-
SERVER-76932 Add a way for a thread to know when the SignalHandler thread is done with printAllThreadStacks
- Closed
- is related to
-
SERVER-91012 Recommit SERVER-71520
- In Code Review
-
SERVER-71521 Improve currentOp to include more progress info for read and crud operations
- Closed
- related to
-
SERVER-56756 Primary cannot stepDown when experiencing disk failures
- Closed
-
SERVER-90777 Revert SERVER-71520
- Closed
-
SERVER-95647 Tell threads to dump CurOp info when lock acquisition times out
- Closed
-
SERVER-61251 Ensure long running storage engine operations are interruptible
- Backlog