[SERVER-36727] Dump open transactions and currentOp info in hang analyzer Created: 17/Aug/18  Updated: 06/Dec/22  Resolved: 17/May/19

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: prepare_optional, prepare_testing, tig-hanganalyzer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-39245 Log "verbose" session information in ... Closed
is related to SERVER-38045 Dump session catalog using GDB scripting Closed
is related to SERVER-36726 Log SessionID when we start a session... Closed
Assigned Teams:
Replication
Participants:

 Description   

If a test hangs due to a transaction being leaked, we currently have no idea what transaction caused the problem.



 Comments   
Comment by Judah Schvimer [ 22/Aug/18 ]

I think lock acquisition is a fair concern. I think adding a timeout on the currentOp command or the currentOp command caller to kill the currentOp would sufficiently handle that while still getting valuable information when it is not blocked by a lock.

Comment by Max Hirschhorn [ 22/Aug/18 ]

My concern is around how successful we'd be in a hang scenario with logging any open transactions or currentOp() output - do you think the server commands would block trying to acquire some mutex that'll never be released? If you don't foresee that being an issue, then the approach I'd take is to have hang_analyzer.py (1) learn the port of the mongod process by inspecting its process arguments, (2) spawn a mongo shell process to run the desired server commands, and (3) wait up to a certain amount of time (e.g. 10 seconds) for the mongo shell process to exit. (We can send the mongo shell process a SIGKILL if it doesn't exit on its own.) I'd say that we'd be better served by running the desired server commands before attaching with a debugger as I wouldn't trust the process state after its been perturbed by calling C++ functions via gdb.

CC mark.benvenuto

Comment by Judah Schvimer [ 17/Aug/18 ]

The idea was the former. Do you have an idea for an easier way to accomplish the same goal?

Comment by Max Hirschhorn [ 17/Aug/18 ]

judah.schvimer, is the idea to do this by connecting to the mongod process with a socket and running some commands, or to call a function / walk some in-memory data structure? The existing gdb scripts are only prepared to do the latter.

Generated at Thu Feb 08 04:43:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.