[SERVER-48705] resmoke.py sending SIGABRT to take core dumps on fixture teardown may overwrite core files from hang analyzer Created: 10/Jun/20 Updated: 29/Oct/23 Resolved: 22/Jun/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Vlad Rachev (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tig-hanganalyzer, tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||
| Sprint: | STM 2020-06-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 5 | ||||||||||||||||
| Story Points: | 2 | ||||||||||||||||
| Description |
|
When archival is enabled for a test or test suite, resmoke.py sends a SIGABRT signal to its fixture processes to take a core dump of them (in addition to collecting the mongod data files). If a JavaScript test has already invoked the hang analyzer via an assert.soon(), then the core file generated from the hang analyzer will be overwritten.
Note that the core dump taken by resmoke.py sending a SIGABRT signal is unlikely to match the thread stacks in the hang analyzer output because running the hang analyzer is expected to perturb the state of the MongoDB cluster. |
| Comments |
| Comment by Githook User [ 15/Oct/20 ] |
|
Author: {'name': 'vrachev', 'email': 'vlad.rachev@mongodb.com', 'username': 'vrachev'}Message: |
| Comment by Githook User [ 22/Jun/20 ] |
|
Author: {'name': 'vrachev', 'email': 'vlad.rachev@mongodb.com', 'username': 'vrachev'}Message: Adds an option to the hang-analyzer to kill processes after finishing analysis. |
| Comment by Robert Guo (Inactive) [ 11/Jun/20 ] |
|
Once the hang analyzer from the shell's assert.soon() function has run, we should have all the diagnostic info we need and there would be no need to continue running the fixtures. We can ask the shell to SIGKILL all its fixtures after calling the hang analyzer so the archival code does not have to do it later on. |