[SERVER-36812] Log obvious details when resmoke observes killed processes Created: 22/Aug/18 Updated: 29/Oct/23 Resolved: 19/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.7 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | David Bradford (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dag, tig-qwin-eligible, tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | DAG 2018-12-31 |
| Participants: | |
| Story Points: | 2 |
| Description |
|
In BF-10349, the shell crashed due to segfault, but the shell didn't print out stack trace on exit. Resmoke logged the test exited with -11. However there are 10 mongo shells, it's not clear which one crashed. It's also not clear that's the shell who crashed. We have core dumps in this case, which have sufficient stack trace for debugging. It will be great if the error message can indicate that core dump is available and which process the developer should look into.
Resmoke may also start mongods, I'm not sure if their exit error messages are clear. It would be great it's obvious who observed the crash and the error message from resmoke is consistent with that from the shell (e.g. ReplSetTest). |
| Comments |
| Comment by Githook User [ 19/Dec/18 ] | |||||||||||||||||
|
Author: {'username': 'dbradf', 'email': 'david.bradford@mongodb.com', 'name': 'David Bradford'}Message: | |||||||||||||||||
| Comment by Max Hirschhorn [ 07/Nov/18 ] | |||||||||||||||||
|
Code 14 would probably be another good one to be able to call out fassert() failures. | |||||||||||||||||
| Comment by Mark Benvenuto [ 06/Nov/18 ] | |||||||||||||||||
|
The only other one that hits on Windows is stack overflow. It is defined in ntstatus.h. STATUS_STACK_OVERFLOW ((NTSTATUS)0xC00000FDL) | |||||||||||||||||
| Comment by David Bradford (Inactive) [ 06/Nov/18 ] | |||||||||||||||||
|
We should be sure to cover:
| |||||||||||||||||
| Comment by Siyuan Zhou [ 22/Aug/18 ] | |||||||||||||||||
|
Here is another case in BF-10373. The test log says FSM client exited with -4.
However the task log says the test failed with -3.
The test infrastructure is not obvious from the log messages, neither are the processes that failed with these exit codes. |