[SERVER-41654] Better behavior for large backtraces printed from signal handler Created: 12/Jun/19 Updated: 29/Oct/23 Resolved: 09/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Dev Tools 2019-07-29, Dev Tools 2019-08-12, Dev Tools 2019-09-09, Dev Tools 2019-09-23, Dev Tools 2019-10-07, Dev Tools 2019-10-21 |
| Participants: |
| Description |
|
It looks like we have a special stream class which we use to avoid calling malloc in signal handlers here. It uses a pre-allocated buffer of 100k bytes. Unfortunately, the JSON blob for a very deep stack trace (on my machine, at least) can be more than 100,000 characters, and get truncated when logged from the signal handler. I'm not sure if stack traces can be so large on "release" builds, but it would be a shame if a customer's node crashed and the printed stacktrace were cut off. I think we should consider doing: 1) Show the human readable stacktrace first (see here). This way, if the output is truncated, something useful is printed. A partial JSON blob is basically useless for day-to-day debugging purposes. |
| Comments |
| Comment by Githook User [ 12/Oct/19 ] |
|
Author: {'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}Message: |
| Comment by Billy Donahue [ 12/Oct/19 ] |
| Comment by Githook User [ 09/Oct/19 ] |
|
Author: {'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com', 'name': 'Billy Donahue'}Message: |
| Comment by Billy Donahue [ 04/Oct/19 ] |
| Comment by Billy Donahue [ 21/Sep/19 ] |
|
After spending a lot of time with the stack trace code, I think we're wasting most of that 100k on stuff we don't need. I think we can save by selecting just those somap entries that represent addrs in the stacktrace. |
| Comment by Billy Donahue [ 07/Aug/19 ] |
|
There might be a way that we don't have to try to estimate an upper bound for the backtrace text size. The filesystem operations open,close,write,lseek,unlink are all signal-safe (man 7 signal-safety). We could write a big ol' (possibly anonymous) (possibly ramfs) file with the trace results. When the tracer is done writing that file, the signal handler's final lines of code can decide how to consume or intelligently truncate it. |
| Comment by Ian Boros [ 26/Jul/19 ] |
Sounds good to me. I think this is the important part.
Seems reasonable to defer this given what you said. I think it'd still be a worthwhile change to make, though. If we ever get a HELP ticket about an occasional crash in some deep code path, we might really wish we had this. |
| Comment by David Bradford (Inactive) [ 26/Jul/19 ] |
|
I think we would need to update the build baron tools to be able to recognize and consume the change as well as continue to support stack traces without this change. That seems doable. But it would be nice to have a heads up for when this would land and some samples of the new output we could use to base our updates the tooling. |
| Comment by Max Hirschhorn [ 26/Jul/19 ] |
|
Logkeeper has a restriction of a single long line containing only 4MB worth of text. As for the impact on the Build Baron tooling, I would want to defer to david.bradford on that. |
| Comment by Andrew Morrow (Inactive) [ 24/Jul/19 ] |
|
ian.boros - Regarding the buffer size, I think we can probably just expand that to 1 MB without anyone caring, and I propose that we do just that. Regarding reversing the order, that is going to be a little harder per jesse's comment. max.hirschhorn, could you provide some guidance on how disruptive that might be? If making the buffer a lot bigger addresses 99% of these instances, is it still worth changing the order? |
| Comment by A. Jesse Jiryu Davis [ 21/Jun/19 ] |
|
Note for max.hirschhorn: if we do this it will change the stacktrace order, the Build Baron scripts will need to be adapted for it. |