[SERVER-46360] Hang analyzer json logging improvement Created: 24/Feb/20 Updated: 29/Oct/23 Resolved: 10/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc4, 4.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andy Schwerin | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||
| Sprint: | Dev Tools 2020-02-24, Dev Tools 2020-03-09, Dev Tools 2020-03-23 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Log messages from the hang analyzer under structured logging should be structured, not just text embedded in a string in a json attribute. |
| Comments |
| Comment by Githook User [ 04/May/20 ] | |||||||||||
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: (cherry picked from commit e2d5e628f4c65a66eb5ca39f43aa7f71b2fd1d6b) | |||||||||||
| Comment by Billy Donahue [ 10/Mar/20 ] | |||||||||||
|
changes the undocumented lockInfo command | |||||||||||
| Comment by Githook User [ 10/Mar/20 ] | |||||||||||
|
Author: {'username': 'BillyDonahue', 'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com'}Message: | |||||||||||
| Comment by Billy Donahue [ 05/Mar/20 ] | |||||||||||
|
Code Review: Also addresses | |||||||||||
| Comment by Billy Donahue [ 02/Mar/20 ] | |||||||||||
|
I think I didn't hit the right target in my first commit here.
The real target here should be message IDs related to 20521 and 20522. | |||||||||||
| Comment by Githook User [ 28/Feb/20 ] | |||||||||||
|
Author: {'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}Message: | |||||||||||
| Comment by Eric Milkie [ 27/Feb/20 ] | |||||||||||
|
Your analysis sounds correct to me. I didn't actually see a hang analyzer graph creation fail; only the output of the lock dump. So this ticket can be to just clean up the lock dump output. | |||||||||||
| Comment by Billy Donahue [ 26/Feb/20 ] | |||||||||||
|
milkie, upon closer inspection, the hang analyzer seems to be doing all of its reconnaissance through gdb inspection of C++ symbols and function calls. This is all in `buildscripts/gdb/mongo_lock.py`. The construction of the lock graph is done by find_mutex_holder. In there we just backtrace all the threads and examine their extant call to std::mutex::lock, if any. We extract the waited-on mutex object's __owner member, and generate a graph of waiting threads to mutex-holding threads, connected by edges representing mutex objects that connect the threads. It all looks to be on the up-and-up, and I don't see any structured logging influence here. The only thing structured logging affects is the stdout of the hang_analyzer. Is that the only problem? | |||||||||||
| Comment by Billy Donahue [ 26/Feb/20 ] | |||||||||||
|
Summary of problem and requirements as I understand them. The hang_analyzer.py attaches gdb to a running mongod. gdb (via the buildscripts/gdb/mongo.py command extensions) makes a target function call to the C++ function "call ('mongo::(anonymous namespace)::globalLockManager').dump()" (updated to correct function name) So we need to upgrade the LOGV2 message from its current StringBuilder into a structured object, and change the lines that parse the gdb output. | |||||||||||
| Comment by Eric Milkie [ 26/Feb/20 ] | |||||||||||
|
I think we need to update the lock-dump output, and then change the hang analyzer helper script that generates the waits-for graph to be able to parse this new format. | |||||||||||
| Comment by Andy Schwerin [ 24/Feb/20 ] | |||||||||||
|
Not sure exactly the required improvement, here. |