[SERVER-41654] Better behavior for large backtraces printed from signal handler Created: 12/Jun/19  Updated: 29/Oct/23  Resolved: 09/Oct/19

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Ian Boros Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2019-07-29, Dev Tools 2019-08-12, Dev Tools 2019-09-09, Dev Tools 2019-09-23, Dev Tools 2019-10-07, Dev Tools 2019-10-21
Participants:

 Description   

It looks like we have a special stream class which we use to avoid calling malloc in signal handlers here.

It uses a pre-allocated buffer of 100k bytes. Unfortunately, the JSON blob for a very deep stack trace (on my machine, at least) can be more than 100,000 characters, and get truncated when logged from the signal handler. I'm not sure if stack traces can be so large on "release" builds, but it would be a shame if a customer's node crashed and the printed stacktrace were cut off. I think we should consider doing:

1) Show the human readable stacktrace first (see here). This way, if the output is truncated, something useful is printed. A partial JSON blob is basically useless for day-to-day debugging purposes.
2) Increase the size of the buffer used in the signal handler's output stream (see here).



 Comments   
Comment by Githook User [ 12/Oct/19 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-41654 fix libunwind build
Branch: master
https://github.com/mongodb/mongo/commit/8d5fc650b1c3a38a9d4630a48fe856eba56d6171

Comment by Billy Donahue [ 12/Oct/19 ]

http://mongodbcr.appspot.com/499780030

Comment by Githook User [ 09/Oct/19 ]

Author:

{'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com', 'name': 'Billy Donahue'}

Message: SERVER-41654 trim somap to relevant libs on stacktrace
Branch: master
https://github.com/mongodb/mongo/commit/00ad16093a6e6468d4d2ff1f46d8e5c08fcca21b

Comment by Billy Donahue [ 04/Oct/19 ]

https://mongodbcr.appspot.com/511390018/

Comment by Billy Donahue [ 21/Sep/19 ]

After spending a lot of time with the stack trace code, I think we're wasting most of that 100k on stuff we don't need.
We are dumping the processInfo.somap object into the JSON line, and for a dynamic link, it's huge.
It's like high-hundreds of entries, and each is a filename., so that's something like 80k

I think we can save by selecting just those somap entries that represent addrs in the stacktrace.
The stacktrace might be maybe 20-100 addrs deep, with lots of addrs coming from the same few libs.
So that's probably only a partial somap of maybe 20 entries instead of 800 if we're selective. 2k, not 80k.
We could try that.

Comment by Billy Donahue [ 07/Aug/19 ]

There might be a way that we don't have to try to estimate an upper bound for the backtrace text size.

The filesystem operations open,close,write,lseek,unlink are all signal-safe (man 7 signal-safety). We could write a big ol' (possibly anonymous) (possibly ramfs) file with the trace results. When the tracer is done writing that file, the signal handler's final lines of code can decide how to consume or intelligently truncate it.

Comment by Ian Boros [ 26/Jul/19 ]

Ian Boros - Regarding the buffer size, I think we can probably just expand that to 1 MB without anyone caring, and I propose that we do just that.

Sounds good to me. I think this is the important part.

Regarding reversing the order, that is going to be a little harder per A. Jesse Jiryu Davis's comment. Max Hirschhorn, could you provide some guidance on how disruptive that might be? If making the buffer a lot bigger addresses 99% of these instances, is it still worth changing the order?

Seems reasonable to defer this given what you said. I think it'd still be a worthwhile change to make, though. If we ever get a HELP ticket about an occasional crash in some deep code path, we might really wish we had this.

acm

Comment by David Bradford (Inactive) [ 26/Jul/19 ]

I think we would need to update the build baron tools to be able to recognize and consume the change as well as continue to support stack traces without this change. That seems doable. But it would be nice to have a heads up for when this would land and some samples of the new output we could use to base our updates the tooling.

Comment by Max Hirschhorn [ 26/Jul/19 ]

Logkeeper has a restriction of a single long line containing only 4MB worth of text. As for the impact on the Build Baron tooling, I would want to defer to david.bradford on that.

Comment by Andrew Morrow (Inactive) [ 24/Jul/19 ]

ian.boros - Regarding the buffer size, I think we can probably just expand that to 1 MB without anyone caring, and I propose that we do just that.

Regarding reversing the order, that is going to be a little harder per jesse's comment. max.hirschhorn, could you provide some guidance on how disruptive that might be? If making the buffer a lot bigger addresses 99% of these instances, is it still worth changing the order?

Comment by A. Jesse Jiryu Davis [ 21/Jun/19 ]

Note for max.hirschhorn: if we do this it will change the stacktrace order, the Build Baron scripts will need to be adapted for it.

Generated at Thu Feb 08 04:58:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.