[SERVER-24249] shell process handling improvements Created: 22/May/16  Updated: 05/Apr/17  Resolved: 12/Oct/16

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Eric Milkie Assignee: Matt Cotter
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-25777 StopMongoProgram shouldn't implicitly... Closed
Backwards Compatibility: Fully Compatible
Sprint: Platforms 2016-10-31
Participants:
Linked BF Score: 0

 Description   

killDb() needs to have a longer timeout (it currently only waits 60 seconds before dropping the hammer), and it should also not wait so long after sending a kill signal (in fact, we don't need to wait at all unless on Windows, I'm pretty sure.)



 Comments   
Comment by Matt Cotter [ 10/Oct/16 ]

I'm in favor of exposing a new SignalMongoProgramByPid (or something) method. This would allow us to kill processes manually inside of an assert.soon() by sending the SIGTERM ourselves.

I don't love having to kill the processes ourselves, but this way at least it's tunable on a case-by-base basis.

Comment by Eric Milkie [ 24/May/16 ]

The BF ticket linked here will need a different solution if the timeout remains at 60 seconds. I'm not sure what else to do for it other than to try to make shutdown speedier.

Comment by Mira Carey [ 24/May/16 ]

milkie,

Is this still a problem if the timeout is 60 seconds (as Max is saying)?

Comment by Max Hirschhorn [ 22/May/16 ]

killDb() needs to have a longer timeout (it currently only waits 6 seconds before dropping the hammer)

The killDb() function waits 1 minute before sending a SIGKILL to the process. Each iteration of the loop below waits for 100 milliseconds and on the 600th iteration, a SIGKILL is sent.

for (int i = 0; i < 1300; ++i) {
    if (i == 600) {
        log() << "process on port " << port << ", with pid " << pid
              << " not terminated, sending sigkill";
        kill_wrapper(pid, SIGKILL, port, opt);
        killSignalSent = true;
    }
    processTerminated = wait_for_pid(pid, false, &exitCode);
    if (processTerminated) {
        break;
    }
    sleepmillis(100);
}


On the 3.0 branch each iteration of the loop sleeps for 1 second (instead of 100 milliseconds) and the SIGKILL is sent after the 60th (of the 130) iterations.

int i = 0;
for (; i < 130; ++i) {
    if (i == 60) {
        log() << "process on port " << port << ", with pid " << pid
              << " not terminated, sending sigkill" << endl;
        kill_wrapper(pid, SIGKILL, port, opt);
    }
    if (wait_for_pid(pid, false, &exitCode))
        break;
    sleepmillis(1000);
}

Generated at Thu Feb 08 04:05:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.