[SERVER-47755] Send SIGABRT as a fallback in the hang analyzer Created: 24/Apr/20 Updated: 29/Oct/23 Resolved: 12/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Raiden Worley (Inactive) | Assignee: | Raiden Worley (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | quick-win, tig-hanganalyzer | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | STM 2020-10-05, STM 2020-10-19 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 14 | ||||||||
| Story Points: | 2 | ||||||||
| Description |
|
Due to TIG-859, the hang analyzer may fail to attach and create core dumps in macOS tests. Core dumps are often the only way to get information about process state, and lack of them may completely block Server engineers on BFs, such as the recent BF-16858. As a fallback measure, we should send SIGABRT to processes that the debugger has failed to create core dumps from. We have precedent from |
| Comments |
| Comment by Githook User [ 12/Oct/20 ] |
|
Author: {'name': 'Carl Raiden Worley', 'email': 'carl.worley@10gen.com', 'username': 'aggrand'}Message: |
| Comment by Githook User [ 12/Oct/20 ] |
|
Author: {'name': 'Carl Raiden Worley', 'email': 'carl.worley@10gen.com', 'username': 'aggrand'}Message: |
| Comment by Raiden Worley (Inactive) [ 08/Oct/20 ] |
|
In case anyone wants to run the hang analyzer locally after these changes: I was able to get consistent core dumps from SIGABRTs locally by running sudo sysctl kern.corefile="dump_%N.%P.core" and ulimit -c unlimited. This worked for me on macOS version 10.14.6, but apparently the sysctl API for this has changed a few times across versions.
We'll go ahead with the changes in this ticket, which will immediately improve the situation to create core dumps more often than not. We can expect the rest of the core dumps to start being successfully generated after BUILD-12127 is resolved. |
| Comment by Cristopher Stauffer [ 02/Oct/20 ] |
|
Richard, I wanted to reopen this to make we answer Samy's question and see if there are any other possible paths before we close this out. If in fact we have no other options, we should discuss what test coverage for MacOS looks like in the future. |
| Comment by Samyukta Lanka [ 29/Sep/20 ] |
|
richard.samuels Repl has been seeing a lot of failures on MacOS without core dumps recently. Do you know if there's another ticket we could track with an alternative approach? |
| Comment by Richard Samuels (Inactive) [ 29/Sep/20 ] |
|
SIGABRT does not reliably produce core dumps on macOS. It might even be less reliable that what we currently do with lldb. |
| Comment by Brooke Miller [ 28/May/20 ] |
|
Need to wait until Archive Data Files Project (PM-1569) is done, or until the hang_analyzer behaves differently for macOS. |
| Comment by Brooke Miller [ 12/May/20 ] |
|
Discussed that we will add an option to make it no op when it's running on mac in evergreen and add a code path to resmoke's signal handler to sigabt all processes when archival is not configured on MacOS in the case of a hang. |
| Comment by Raiden Worley (Inactive) [ 01/May/20 ] |
|
Re above comment: |
| Comment by Raiden Worley (Inactive) [ 24/Apr/20 ] |
|
Worth thinking about: if we manage to send SIGSTOP to resolve TIG-768 as discussed in the comments of |