[SERVER-28437] Hang analyzer GDB module mongo_lock.py should use a memory based mechanism to discover the locks Created: 22/Mar/17  Updated: 06/Dec/22  Resolved: 28/Jan/22

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jonathan Abrahams Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Won't Do Votes: 0
Labels: PM-626, tig-hanganalyzer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-38045 Dump session catalog using GDB scripting Closed
Assigned Teams:
Server Tooling & Methods
Sprint: TIG 2017-04-17
Participants:

 Description   

The lock discovery used in the Hang Analyzer GDB module mongo_lock.py invokes functions to retrieve lock data. This approach has the following side affects:

  • Only works for a live process, and not with a core dump
  • When the function is called, other threads can run, which may change the state of the lock tables

A preferable method would be to inspect memory, including the complex C++ std structures, like iterating over an ordered Map. Since this is a difficult undertaking, it may be possible using a third party module, like the Tr1UnorderedMapPrinter class found in /opt/mongodbtoolchain/v2/share/gcc-5.4.0/python/libstdcxx/v6/printers.py



 Comments   
Comment by Max Hirschhorn [ 28/Feb/22 ]

Hey team, I went ahead and implemented this functionality in a new GDB extension library - https://github.com/visemet/gdb-mongodb-server.

It hasn't been put through the paces like the hang analyzer has so I anticipate there are certain cases (MongoDB toolchain version, compiler options, etc.) where it doesn't work. I did at least confirm I could dump out the LockManager from the core dump in BF-23292 (a relatively recent LockManager deadlock).

Comment by Brooke Miller [ 28/Jan/22 ]

We don't have plans to evaluate memory-based debugging info extraction in the hang-analyzer.

Comment by Eddie Louie [ 24/Apr/17 ]

Sorry for the delayed update. It took a bit of time understanding the Python code. Currently I've managed to get the pair structures printed (each element of the unordered_map is a <ResourceId, LockHead*> pair). I was a bit confused by the Tr1HashtableIterator code. It turns out the StdHashtableIterator was more appropriate to use. So instead of going through each thread one by one and dumping the LockManager locks, we could access the LockManager's _lockBuckets array and iterate through each bucket. Within each bucket we can access the data field, which is the unordered_map that contains all the LockHeads. Once we iterate through all the LockHeads, we can access the grantedList and conflictList. These lists contain the LockRequests that contain the holders and waiters for each lock. I have not integrated all the code yet. I'm looking into this next. Once that is done, we can remove the code that calls the findOrInsert function.

Comment by Max Hirschhorn [ 19/Apr/17 ]

eddie.louie, could you post an update with how far you've gotten in exploring the LockManager structures without calling functions (e.g. using Tr1UnorderedMapPrinter)? I'd like to know how close we think we are to making it possible to run on a core dump.

Comment by Max Hirschhorn [ 23/Mar/17 ]

Given that this would also resolve the issue we're having with calling functions on Solaris and would enable support for running the mongodb-show-locks and mongodb-waitsfor-graph GDB commands on core dumps, I think we should spend at least a week trying to get this to work.

If we discover that this is too difficult for some reason, then we can move it back to the backlog.

Generated at Thu Feb 08 04:18:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.