[SERVER-40702] resmoke.py should wait for subprocesses it spawned to exit on KeyboardInterrupt Created: 17/Apr/19  Updated: 29/Oct/23  Resolved: 03/Jul/19

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Robert Guo (Inactive)
Resolution: Fixed Votes: 0
Labels: tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-40518 backup_restore*.js tests send SIGTERM... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: STM 2019-07-15
Participants:
Linked BF Score: 15
Story Points: 1

 Description   

resmoke.py doesn't wait for the job threads running tests to exit when they are interrupted by the user. It instead relies on the SIGINT being received by all the processes in the process group to exit on their own quickly. While this may reduce the likelihood a user would interrupt resmoke.py multiple times due to it taking longer to exit, it also means that processes spawned by resmoke.py may outlive the resmoke.py Python process. This behavior has caused failures in the backup_restore*.js tests which spawns its own resmoke.py subprocess in order to run FSM workloads against a ReplSetTest instance.

We should call thr.join() even after a KeyboardInterrupt exception occurs. However, it would be convenient for users if we also logged a message (say after 2 seconds of waiting for the thread) that they can use ctrl-\ to send a SIGQUIT to all of the processes to get them to exit on Linux or ctrl-c again to get them to exit on Windows as the Job object has JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE set. Sending a SIGQUIT is an easy way to ensure resmoke.py exits even if the mongod process is hung.

def _run_tests(self, test_queue, setup_flag, teardown_flag):
    """Start a thread for each Job instance and block until all of the tests are run.
    Returns a (combined report, user interrupted) pair, where the
    report contains the status and timing information of tests run
    by all of the threads.
    """
 
    threads = []
    interrupt_flag = threading.Event()
    user_interrupted = False
    try:
        # Run each Job instance in its own thread.
        for job in self._jobs:
            thr = threading.Thread(
                target=job, args=(test_queue, interrupt_flag), kwargs=dict(
                    setup_flag=setup_flag, teardown_flag=teardown_flag))
            # Do not wait for tests to finish executing if interrupted by the user.
            thr.daemon = True
            thr.start()
            threads.append(thr)
            # SERVER-24729 Need to stagger when jobs start to reduce I/O load if there
            # are many of them.  Both the 5 and the 10 are arbitrary.
            # Currently only enabled on Evergreen.
            if _config.STAGGER_JOBS and len(threads) >= 5:
                time.sleep(10)
 
        joined = False
        while not joined:
            # Need to pass a timeout to join() so that KeyboardInterrupt exceptions
            # are propagated.
            joined = test_queue.join(TestSuiteExecutor._TIMEOUT)
    except (KeyboardInterrupt, SystemExit):
        interrupt_flag.set()
        user_interrupted = True
    else:
        # Only wait for all the Job instances if not interrupted by the user.
        self.logger.debug("Waiting for threads to complete")
        for thr in threads:
            thr.join()
        self.logger.debug("Threads are completed!")



 Comments   
Comment by Githook User [ 03/Jul/19 ]

Author:

{'name': 'Robert Guo', 'username': 'guoyr', 'email': 'robert.guo@10gen.com'}

Message: SERVER-40702 wait for processes to exit on KeyboardInterrupt in resmoke
Branch: v4.2
https://github.com/mongodb/mongo/commit/60d7c98c0046f66422907e22e71118206b756c18

Comment by Githook User [ 03/Jul/19 ]

Author:

{'name': 'Robert Guo', 'username': 'guoyr', 'email': 'robert.guo@10gen.com'}

Message: SERVER-40702 wait for processes to exit on KeyboardInterrupt in resmoke
Branch: master
https://github.com/mongodb/mongo/commit/f747b5477375197f4bef6e2e06899f7f974b9151

Generated at Thu Feb 08 04:55:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.