[SERVER-48705] resmoke.py sending SIGABRT to take core dumps on fixture teardown may overwrite core files from hang analyzer Created: 10/Jun/20  Updated: 29/Oct/23  Resolved: 22/Jun/20

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Vlad Rachev (Inactive)
Resolution: Fixed Votes: 0
Labels: tig-hanganalyzer, tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-46687 Run hang-analyzer from resmoke and in... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: STM 2020-06-29
Participants:
Linked BF Score: 5
Story Points: 2

 Description   

When archival is enabled for a test or test suite, resmoke.py sends a SIGABRT signal to its fixture processes to take a core dump of them (in addition to collecting the mongod data files). If a JavaScript test has already invoked the hang analyzer via an assert.soon(), then the core file generated from the hang analyzer will be overwritten.

[fsm_workload_test:agg_merge_when_matched_replace_with_new] 2020-05-26T08:34:20.234+0000 sh118695| Saved corefile dump_mongod.4235.core
...
[ShardedClusterFixture:job0:shard0:secondary0] Attempting to send SIGABRT from resmoke to mongod on port 20002 with pid 4235...

Note that the core dump taken by resmoke.py sending a SIGABRT signal is unlikely to match the thread stacks in the hang analyzer output because running the hang analyzer is expected to perturb the state of the MongoDB cluster.



 Comments   
Comment by Githook User [ 15/Oct/20 ]

Author:

{'name': 'vrachev', 'email': 'vlad.rachev@mongodb.com', 'username': 'vrachev'}

Message: SERVER-48705 disable taking cores during resmoke fixture teardown
Branch: v4.4
https://github.com/mongodb/mongo/commit/392093165d9b23f2ebaf5d6fe475ccbef4c86d3b

Comment by Githook User [ 22/Jun/20 ]

Author:

{'name': 'vrachev', 'email': 'vlad.rachev@mongodb.com', 'username': 'vrachev'}

Message: SERVER-48705 resmoke.py sending SIGABRT to take core dumps on fixture teardown may overwrite core files from hang analyzer.

Adds an option to the hang-analyzer to kill processes after finishing analysis.
Updates assert.soon's usage of the hang-analyzer to use this option.
Branch: master
https://github.com/mongodb/mongo/commit/18f88ce0680ab946760b599437977ffd60c49678

Comment by Robert Guo (Inactive) [ 11/Jun/20 ]

Once the hang analyzer from the shell's assert.soon() function has run, we should have all the diagnostic info we need and there would be no need to continue running the fixtures. We can ask the shell to SIGKILL all its fixtures after calling the hang analyzer so the archival code does not have to do it later on.

Generated at Thu Feb 08 05:17:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.