Currently the test lib asserts that on SIGINT the process exits with code 130, which corresponds to resmoke being interrupted during the test suite. If resmoke is interrupted while doing other work (e.g. parsing YAML), returned code will be -2 (-SIGINT) and assertion will fail.
The assertion should be for both codes.
BF-25469
Assertion might fail depending on timing due to different errors being returned depending on the state of the resmoke call when SIGINT is received.
Clean exit and non-clean exit is handled differently in shell_utils_launcher.cpp.
- When resmoke is killed while actually running, resmoke handles the interrupt and exits cleanly, error 130 is returned and the assertion passes. executor.py catches KeyboardInterrupt while running test suites and resmoke sets the exit code for the suite
- When resmoke is killed while not running the suite, in this case parsing YAML, the exception is not catched, and error returned is -2 (-SIGINT) and the assertion fails. See python stack trace above with KeyboardInterrupt exception.
Not sure if this could be fixed in resmoke, nor if it would make sense to change this. Repro: [^interrupt_test.js][^dummy_sleep.js]
For now we can fix the assertion by also checking for -2.
Another question that arises is why the FSM client is so slow to start, as in the failing instance the FSM client is killed before it has actually done anything. I guess this is probably due to a saturated system.
- related to
-
SERVER-72449 backup_restore.js should check for code 2 when killing child resmoke client
- Closed