[SERVER-45587] change the format of SIGUSR2 stack dumps Created: 15/Jan/20 Updated: 29/Oct/23 Resolved: 30/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Dev Tools 2020-01-27, Dev Tools 2020-02-10 | ||||||||
| Participants: | |||||||||
| Description |
|
All threads' stack traces are dumped together in one large log statement as a JSON document. The request is to downgrade these in a way, breaking the large json document into a sequence of separate log statements, one per thread, reusing the crash-style stacktrace format (with some changes to obscure any raw addresses). These should be bracketed by some metadata log statements indicating beginning and end of the stack trace set. |
| Comments |
| Comment by Githook User [ 30/Jan/20 ] | |||||||
|
Author: {'username': 'BillyDonahue', 'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com'}Message: | |||||||
| Comment by Billy Donahue [ 30/Jan/20 ] | |||||||
|
Fixed by | |||||||
| Comment by Githook User [ 30/Jan/20 ] | |||||||
|
Author: {'username': 'BillyDonahue', 'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com'}Message:
create mode 100755 buildscripts/mongosymb_multithread.py | |||||||
| Comment by Billy Donahue [ 24/Jan/20 ] | |||||||
|
commit queue | |||||||
| Comment by Bruce Lucas (Inactive) [ 24/Jan/20 ] | |||||||
|
It's good to hear that we can disable truncation as needed, but I would still be worried about the practical impact of logging this as a single JSON line that might end up being multiple MB (say 1k per stack trace times 10s of thousands of threads). A quick test with emacs indicates that it can handle such a long line, but performance moving the cursor around at points after the line gets very slow. Other log viewing tools might have other problems. | |||||||
| Comment by Henrik Edin [ 24/Jan/20 ] | |||||||
|
I got the code review for this and I realize that I'm a little late to the party. But I feel like the format in the logs is missing a great opportunity to take more advantage of Structured logging instead of separating the output in multiple log statements like the current implementation. A more full use of structured logging could have the output to be something like this:
It would result in a very large object but this log statement would have truncation turned off. But we'd get a single statement with a single ID to look up when parsing.
Anyway, just wanted to point that out. | |||||||
| Comment by Bruce Lucas (Inactive) [ 18/Jan/20 ] | |||||||
|
That looks great. Thanks for doing all this - I think this will be a big help in diagnosing some difficult problems in customer situations. | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
sample output from https://mongodbcr.appspot.com/544970038/#ps533200074 | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
https://mongodbcr.appspot.com/544970038/ super early sneak preview Made a new tool: mongosymb_multithread.py that calls mongosymb.py. | |||||||
| Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ] | |||||||
|
At a quick look I think that looks good. (I think the wrapping in the log line, i.e. "msg":"{rec}" and "rec":{...} aren't great, but I think that's probably a discussion for the structured logging project.) | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
mongod.out.json | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
Yes, "useless" was overstated. Withdrawn! Yes, we can emit the processInfo first. The "somap" is trimmed to only include libraries that are actually IN the emitted backtraces. So I'll have to reorg a bit to make this two-pass. It will work fine. | |||||||
| Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ] | |||||||
|
Good point. Does that info need to go at the end - can't it be part of the prolog? Also I think "useless" is an overstatement - depending on the kind of problem you're tracking down, I've often found that just the symbolization provided by the "s" fields is enough to understand the problem. But yes would be better to ensure that we can do the full post-hoc symbolization. | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
bruce.lucas, one concern I have with the proposal is that the backtrace lines are useless until you get to the epilogue. | |||||||
| Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ] | |||||||
|
I think that sounds good. | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
Yeah, I just want to make sure that's acceptable before proceeding. First, there will be a prologue line indicating the start of a stacktrace group. For each thread, a log line of one json object containing a "name", "tid" and "backtrace" array. After all threads, there will be an epilogue log line indicating the end of a stacktrace group. This is a json object containing the "processInfo" field. The "b" field in each frame is a made-up key indicating a library that will appear in the "processInfo" field of the epilogue line. It is not an address. This keeps addresses from being disclosed for a running server. Mongosymb.py will be modified to accept the entire block of log lines from prologue to epilogue, transforming its input into an output with the mongosymb.py enhanced dwarf info we're used to getting, but otherwise not losing any data. | |||||||
| Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ] | |||||||
|
Can we just dump the shared metadata once, and adjust mongosymb.py to understand (or write a new wrapper script)? | |||||||
| Comment by Billy Donahue [ 17/Jan/20 ] | |||||||
|
As a proof-of-concept, mongod.out Each line works with mongosymb.py after stripping the "2020-01-17T12:18:03.825-0500 I COMMAND [SignalHandler]..." line prefixes. In a technical sense, this is a huge downgrade in efficiency. Compare to original log Each thread dump has the FULL ".processInfo" field, including mongod version and ".uname", etc. | |||||||
| Comment by Bruce Lucas (Inactive) [ 15/Jan/20 ] | |||||||
|
This seems important, as in production there can be tens of thousands of threads, making the log line unworkably large (possibly even triggering truncation?) |