[SERVER-36812] Log obvious details when resmoke observes killed processes Created: 22/Aug/18  Updated: 29/Oct/23  Resolved: 19/Dec/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.1.7

Type: Improvement Priority: Major - P3
Reporter: Siyuan Zhou Assignee: David Bradford (Inactive)
Resolution: Fixed Votes: 0
Labels: dag, tig-qwin-eligible, tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: DAG 2018-12-31
Participants:
Story Points: 2

 Description   

In BF-10349, the shell crashed due to segfault, but the shell didn't print out stack trace on exit. Resmoke logged the test exited with -11. However there are 10 mongo shells, it's not clear which one crashed. It's also not clear that's the shell who crashed. We have core dumps in this case, which have sufficient stack trace for debugging. It will be great if the error message can indicate that core dump is available and which process the developer should look into.

 [2018/08/19 15:49:33.497] [executor:js_test:job0] 2018-08-19T19:49:33.495+0000 Received a StopExecution exception: JSTest jstestfuzz/out/jstestfuzz-6828-ent_fe14-qa_a6ce-1534707044622-33.js failed.
 [2018/08/19 15:49:33.684] [executor] 2018-08-19T19:49:33.684+0000 Summary: 67 test(s) ran in 1040.76 seconds (66 succeeded, 41 were skipped, 1 failed, 0 errored)
 [2018/08/19 15:49:33.684]     The following tests failed (with exit code):
 [2018/08/19 15:49:33.684]         jstestfuzz/out/jstestfuzz-6828-ent_fe14-qa_a6ce-1534707044622-33.js (-11)

Resmoke may also start mongods, I'm not sure if their exit error messages are clear. It would be great it's obvious who observed the crash and the error message from resmoke is consistent with that from the shell (e.g. ReplSetTest).



 Comments   
Comment by Githook User [ 19/Dec/18 ]

Author:

{'username': 'dbradf', 'email': 'david.bradford@mongodb.com', 'name': 'David Bradford'}

Message: SERVER-36812: Add human readable messages for exit codes resmoke sees
Branch: master
https://github.com/mongodb/mongo/commit/8df4497cfc7403a9063b79ed97c3e3e489ea35e3

Comment by Max Hirschhorn [ 07/Nov/18 ]

Code 14 would probably be another good one to be able to call out fassert() failures.

Comment by Mark Benvenuto [ 06/Nov/18 ]

The only other one that hits on Windows is stack overflow. It is defined in ntstatus.h.

STATUS_STACK_OVERFLOW ((NTSTATUS)0xC00000FDL)

Comment by David Bradford (Inactive) [ 06/Nov/18 ]

We should be sure to cover:

  • exit codes in: dbshell.cpp, src/mongo/shell/dbshell.cpp
  • window access violation: -1073741819
  • unix exit codes: -6, -9, -11
Comment by Siyuan Zhou [ 22/Aug/18 ]

Here is another case in BF-10373. The test log says FSM client exited with -4.

[js_test:backup_restore_rolling] 2018-08-22T00:12:36.221+0000 2018-08-22T00:12:36.222+0000 I -        [js] shell: stopped mongo program with pid 3864
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 assert failed : backup_restore_rolling FSM client was not running at end of test and exited with code: -4
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 doassert@src/mongo/shell/assert.js:20:14
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 assert@src/mongo/shell/assert.js:150:9
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 BackupRestoreTest/this.run@jstests/noPassthrough/libs/backup_restore.js:410:1
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 @jstests\noPassthrough\backup_restore_rolling.js:40:9
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 @jstests\noPassthrough\backup_restore_rolling.js:19:2
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 2018-08-22T00:12:36.223+0000 E QUERY    [js] Error: assert failed : backup_restore_rolling FSM client was not running at end of test and exited with code: -4 :
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 doassert@src/mongo/shell/assert.js:20:14
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 assert@src/mongo/shell/assert.js:150:9
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 BackupRestoreTest/this.run@jstests/noPassthrough/libs/backup_restore.js:410:1
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.223+0000 @jstests\noPassthrough\backup_restore_rolling.js:40:9
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.224+0000 @jstests\noPassthrough\backup_restore_rolling.js:19:2
[js_test:backup_restore_rolling] 2018-08-22T00:12:36.226+0000 failed to load: jstests\noPassthrough\backup_restore_rolling.js

However the task log says the test failed with -3.

[2018/08/21 20:18:27.892] The following tests failed (with exit code):
[2018/08/21 20:18:27.892]     jstests\noPassthrough\backup_restore_rolling.js (-3)

The test infrastructure is not obvious from the log messages, neither are the processes that failed with these exit codes.

Generated at Thu Feb 08 04:44:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.