[SERVER-28437] Hang analyzer GDB module mongo_lock.py should use a memory based mechanism to discover the locks Created: 22/Mar/17 Updated: 06/Dec/22 Resolved: 28/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jonathan Abrahams | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | PM-626, tig-hanganalyzer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Server Tooling & Methods
|
||||||||
| Sprint: | TIG 2017-04-17 | ||||||||
| Participants: | |||||||||
| Description |
|
The lock discovery used in the Hang Analyzer GDB module mongo_lock.py invokes functions to retrieve lock data. This approach has the following side affects:
A preferable method would be to inspect memory, including the complex C++ std structures, like iterating over an ordered Map. Since this is a difficult undertaking, it may be possible using a third party module, like the Tr1UnorderedMapPrinter class found in /opt/mongodbtoolchain/v2/share/gcc-5.4.0/python/libstdcxx/v6/printers.py |
| Comments |
| Comment by Max Hirschhorn [ 28/Feb/22 ] |
|
Hey team, I went ahead and implemented this functionality in a new GDB extension library - https://github.com/visemet/gdb-mongodb-server. It hasn't been put through the paces like the hang analyzer has so I anticipate there are certain cases (MongoDB toolchain version, compiler options, etc.) where it doesn't work. I did at least confirm I could dump out the LockManager from the core dump in BF-23292 (a relatively recent LockManager deadlock). |
| Comment by Brooke Miller [ 28/Jan/22 ] |
|
We don't have plans to evaluate memory-based debugging info extraction in the hang-analyzer. |
| Comment by Eddie Louie [ 24/Apr/17 ] |
|
Sorry for the delayed update. It took a bit of time understanding the Python code. Currently I've managed to get the pair structures printed (each element of the unordered_map is a <ResourceId, LockHead*> pair). I was a bit confused by the Tr1HashtableIterator code. It turns out the StdHashtableIterator was more appropriate to use. So instead of going through each thread one by one and dumping the LockManager locks, we could access the LockManager's _lockBuckets array and iterate through each bucket. Within each bucket we can access the data field, which is the unordered_map that contains all the LockHeads. Once we iterate through all the LockHeads, we can access the grantedList and conflictList. These lists contain the LockRequests that contain the holders and waiters for each lock. I have not integrated all the code yet. I'm looking into this next. Once that is done, we can remove the code that calls the findOrInsert function. |
| Comment by Max Hirschhorn [ 19/Apr/17 ] |
|
eddie.louie, could you post an update with how far you've gotten in exploring the LockManager structures without calling functions (e.g. using Tr1UnorderedMapPrinter)? I'd like to know how close we think we are to making it possible to run on a core dump. |
| Comment by Max Hirschhorn [ 23/Mar/17 ] |
|
Given that this would also resolve the issue we're having with calling functions on Solaris and would enable support for running the mongodb-show-locks and mongodb-waitsfor-graph GDB commands on core dumps, I think we should spend at least a week trying to get this to work. If we discover that this is too difficult for some reason, then we can move it back to the backlog. |