[SERVER-45587] change the format of SIGUSR2 stack dumps Created: 15/Jan/20  Updated: 29/Oct/23  Resolved: 30/Jan/20

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Improvement Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod.out     File mongod.out.json     Text File mongod.symb.txt    
Issue Links:
Depends
is depended on by SERVER-45068 update mongosymb.py to handle multi-t... Closed
Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2020-01-27, Dev Tools 2020-02-10
Participants:

 Description   

All threads' stack traces are dumped together in one large log statement as a JSON document.

The request is to downgrade these in a way, breaking the large json document into a sequence of separate log statements, one per thread, reusing the crash-style stacktrace format (with some changes to obscure any raw addresses). These should be bracketed by some metadata log statements indicating beginning and  end of the stack trace set.



 Comments   
Comment by Githook User [ 30/Jan/20 ]

Author:

{'username': 'BillyDonahue', 'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-45587 remove unused variable
Branch: master
https://github.com/mongodb/mongo/commit/e6468fe7daf0da3cd82521202ca32d6781242250

Comment by Billy Donahue [ 30/Jan/20 ]

Fixed by SERVER-45068

Comment by Githook User [ 30/Jan/20 ]

Author:

{'username': 'BillyDonahue', 'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-45587 SIGUSR2 one thread per log line

  • add mongosymb_multithread.py
  • mongosymb.py argparse and slight refactor for reuse.

create mode 100755 buildscripts/mongosymb_multithread.py
Branch: master
https://github.com/mongodb/mongo/commit/bc68d8424d15a9f3eeab35730b0043459e289ad9

Comment by Billy Donahue [ 24/Jan/20 ]

commit queue

Comment by Bruce Lucas (Inactive) [ 24/Jan/20 ]

It's good to hear that we can disable truncation as needed, but I would still be worried about the practical impact of logging this as a single JSON line that might end up being multiple MB (say 1k per stack trace times 10s of thousands of threads). A quick test with emacs indicates that it can handle such a long line, but performance moving the cursor around at points after the line gets very slow. Other log viewing tools might have other problems.

Comment by Henrik Edin [ 24/Jan/20 ]

I got the code review for this and I realize that I'm a little late to the party. But I feel like the format in the logs is missing a great opportunity to take more advantage of Structured logging instead of separating the output in multiple log statements like the current implementation.

A more full use of structured logging could have the output to be something like this:

 

{
"id": X,
"attr": {
  "prologue": {...} // everything needed to symbolize
  "threads": [{...}, {...}, ..., {...}] // JSON array of each thread containing a subobject for everything related to each thread
  }
}

 

 

It would result in a very large object but this log statement would have truncation turned off. But we'd get a single statement with a single ID to look up when parsing.

 

Anyway, just wanted to point that out.

Comment by Bruce Lucas (Inactive) [ 18/Jan/20 ]

That looks great. Thanks for doing all this - I think this will be a big help in diagnosing some difficult problems in customer situations.

Comment by Billy Donahue [ 17/Jan/20 ]

sample output from https://mongodbcr.appspot.com/544970038/#ps533200074

mongod.symb.txt

Comment by Billy Donahue [ 17/Jan/20 ]

https://mongodbcr.appspot.com/544970038/

super early sneak preview

Made a new tool: mongosymb_multithread.py that calls mongosymb.py.
So I don't touch mongosymb.py.

Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ]

At a quick look I think that looks good.

(I think the wrapping in the log line, i.e. "msg":"{rec}" and "rec":{...} aren't great, but I think that's probably a discussion for the structured logging project.)

Comment by Billy Donahue [ 17/Jan/20 ]

mongod.out.json is the json structured log after the proposed refactor. WDYT?

Comment by Billy Donahue [ 17/Jan/20 ]

Yes, "useless" was overstated. Withdrawn!   Only the "b" and "o" fields would be useless. Sometimes you don't get an "s".

Yes, we can emit the processInfo first. The "somap" is trimmed to only include libraries that are actually IN the emitted backtraces. So I'll have to reorg a bit to make this two-pass. It will work fine.

Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ]

Good point. Does that info need to go at the end - can't it be part of the prolog?

Also I think "useless" is an overstatement - depending on the kind of problem you're tracking down, I've often found that just the symbolization provided by the "s" fields is enough to understand the problem. But yes would be better to ensure that we can do the full post-hoc symbolization.

Comment by Billy Donahue [ 17/Jan/20 ]

bruce.lucas, one concern I have with the proposal is that the backtrace lines are useless until you get to the epilogue.
If something stops the epilogue from being printed out in full, or if it just isn't captured by the support customer, we can't use any of the other lines. In the very wasteful version in my current PoC (attached), the lines are at least usable one-at-a-time.

Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ]

I think that sounds good.

Comment by Billy Donahue [ 17/Jan/20 ]

Yeah, I just want to make sure that's acceptable before proceeding.
So the proposal now:

First, there will be a prologue line indicating the start of a stacktrace group.
It will contain a json array with tids of all the threads we missed, if there were any.

For each thread, a log line of one json object containing a "name", "tid" and "backtrace" array.
Each element of a backtrace array has "b", "o" fields, and maybe more hints like "s" and "s+" if known (sym and symOffset). 

After all threads, there will be an epilogue log line indicating the end of a stacktrace group. This is a json object containing the "processInfo" field.

The "b" field in each frame is a made-up key indicating a library that will appear in the "processInfo" field of the epilogue line. It is not an address. This keeps addresses from being disclosed for a running server.

Mongosymb.py will be modified to accept the entire block of log lines from prologue to epilogue, transforming its input into an output with the mongosymb.py enhanced dwarf info we're used to getting, but otherwise not losing any data.

Comment by Bruce Lucas (Inactive) [ 17/Jan/20 ]

Can we just dump the shared metadata once, and adjust mongosymb.py to understand (or write a new wrapper script)?

Comment by Billy Donahue [ 17/Jan/20 ]

As a proof-of-concept, mongod.out is an example of what we'd be producing in response to SIGUSR2.

Each line works with mongosymb.py after stripping the

    "2020-01-17T12:18:03.825-0500 I COMMAND [SignalHandler]..."

line prefixes.

In a technical sense, this is a huge downgrade in efficiency.

Compare to original log

Each thread dump has the FULL ".processInfo" field, including mongod version and ".uname", etc.
We're also dumping out the library metadata for each library on each thread in the ".somap".

Comment by Bruce Lucas (Inactive) [ 15/Jan/20 ]

This seems important, as in production there can be tens of thousands of threads, making the log line unworkably large (possibly even triggering truncation?)

Generated at Thu Feb 08 05:09:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.