[SERVER-36727] Dump open transactions and currentOp info in hang analyzer Created: 17/Aug/18 Updated: 06/Dec/22 Resolved: 17/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | prepare_optional, prepare_testing, tig-hanganalyzer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
If a test hangs due to a transaction being leaked, we currently have no idea what transaction caused the problem. |
| Comments |
| Comment by Judah Schvimer [ 22/Aug/18 ] |
|
I think lock acquisition is a fair concern. I think adding a timeout on the currentOp command or the currentOp command caller to kill the currentOp would sufficiently handle that while still getting valuable information when it is not blocked by a lock. |
| Comment by Max Hirschhorn [ 22/Aug/18 ] |
|
My concern is around how successful we'd be in a hang scenario with logging any open transactions or currentOp() output - do you think the server commands would block trying to acquire some mutex that'll never be released? If you don't foresee that being an issue, then the approach I'd take is to have hang_analyzer.py (1) learn the port of the mongod process by inspecting its process arguments, (2) spawn a mongo shell process to run the desired server commands, and (3) wait up to a certain amount of time (e.g. 10 seconds) for the mongo shell process to exit. (We can send the mongo shell process a SIGKILL if it doesn't exit on its own.) I'd say that we'd be better served by running the desired server commands before attaching with a debugger as I wouldn't trust the process state after its been perturbed by calling C++ functions via gdb. |
| Comment by Judah Schvimer [ 17/Aug/18 ] |
|
The idea was the former. Do you have an idea for an easier way to accomplish the same goal? |
| Comment by Max Hirschhorn [ 17/Aug/18 ] |
|
judah.schvimer, is the idea to do this by connecting to the mongod process with a socket and running some commands, or to call a function / walk some in-memory data structure? The existing gdb scripts are only prepared to do the latter. |