[SERVER-67390] backup_restore.js should check for code -SIGINT due to unclean SIGINT Created: 20/Jun/22  Updated: 29/Oct/23  Resolved: 27/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.1, 5.0.15, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Yujin Kang Park
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-72449 backup_restore.js should check for co... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0
Sprint: Execution Team 2022-06-27, Execution Team 2022-07-11
Participants:
Linked BF Score: 25

 Description   

Currently the test lib asserts that on SIGINT the process exits with code 130, which corresponds to resmoke being interrupted during the test suite. If resmoke is interrupted while doing other work (e.g. parsing YAML), returned code will be -2 (-SIGINT) and assertion will fail.

The assertion should be for both codes.

 

Context from

BF-25469
Assertion might fail depending on timing due to different errors being returned depending on the state of the resmoke call when SIGINT is received.

Clean exit and non-clean exit is handled differently in shell_utils_launcher.cpp.

  • When resmoke is killed while actually running, resmoke handles the interrupt and exits cleanly, error 130 is returned and the assertion passes. executor.py catches KeyboardInterrupt while running test suites and resmoke sets the exit code for the suite
  • When resmoke is killed while not running the suite, in this case parsing YAML, the exception is not catched, and error returned is -2 (-SIGINT) and the assertion fails. See python stack trace above with KeyboardInterrupt exception. 

Not sure if this could be fixed in resmoke, nor if it would make sense to change this. Repro: [^interrupt_test.js][^dummy_sleep.js]

For now we can fix the assertion by also checking for -2.

Another question that arises is why the FSM client is so slow to start, as in the failing instance the FSM client is killed before it has actually done anything. I guess this is probably due to a saturated system.



 Comments   
Comment by Githook User [ 31/Jan/23 ]

Author:

{'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}

Message: SERVER-67390: backup_restore.js should also allow exit code -SIGINT

(cherry picked from commit e71996210dbfa8c6783c83dec9a828bcb675c001)
Branch: v5.0
https://github.com/mongodb/mongo/commit/948b8eb1f1f697949e2e46cf88f6d3da76afce8a

Comment by Githook User [ 25/Jul/22 ]

Author:

{'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}

Message: SERVER-67390: backup_restore.js should also allow exit code -SIGINT
Branch: v6.0
https://github.com/mongodb/mongo/commit/0260e3c2d632dccc332cb0f1379dff7f8e281d3c

Comment by Githook User [ 27/Jun/22 ]

Author:

{'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}

Message: SERVER-67390: backup_restore.js should also allow exit code -SIGINT
Branch: master
https://github.com/mongodb/mongo/commit/e71996210dbfa8c6783c83dec9a828bcb675c001

Generated at Thu Feb 08 06:08:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.