[SERVER-9253] `rpm/init.d-mongod stop` unnecessarily sleeps for 5mn Created: 05/Apr/13  Updated: 09/Sep/14  Resolved: 20/Aug/14

Status: Closed
Project: Core Server
Component/s: Packaging
Affects Version/s: 2.4.1
Fix Version/s: 2.7.6

Type: Bug Priority: Minor - P4
Reporter: Alexis Midon Assignee: Benety Goh
Resolution: Done Votes: 4
Labels: community-team, pull-request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-6014 init.d/mongod causing 5 minute delay ... Closed
Related
is related to SERVER-11973 Remove delay of 300 seconds on rpm in... Closed
Tested
Operating System: ALL
Participants:

 Description   

`/etc/init.d/mongod stop` uses killproc with a 5mn delay between TERM and KILL.
In most cases this 5mn sleep is too long and the stop command hangs unnecessarily even though the process exit successfully.

This ticket is to re-implement the same 5mn delay but with a busy-wait loop that would return as early as possible.



 Comments   
Comment by Githook User [ 20/Aug/14 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-9253 added pid_file argument to mongo_killproc. New usage: mongo_killproc pid_file procname
Branch: master
https://github.com/mongodb/mongo/commit/a8d7c7b8559e8a4fe85e13de459a961099e8a04d

Comment by Githook User [ 20/Aug/14 ]

Author:

{u'username': u'alexism', u'name': u'Alexis Midon', u'email': u'alexismidon@gmail.com'}

Message: SERVER-9253 fixed init.d/mongod so that it does not sleep for 5 minutes unnecessarily.
In some version of Linux, killproc() provided in /etc/init.d/functions has a bug
where it will sleep the full duration of the delay (-d).

Closes #411

Signed-off-by: Benety Goh <benety@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/fb1e82a243bc7c2b96ea1cfa78370f08a4c59bc6

Comment by Benety Goh [ 15/Aug/14 ]

init.d/mongod relies on the killproc() function in /etc/init.d/functions to stop the mongod process.

Commit 9369bf1568b73061fe29670b4faae80c6507d56f might be relevant to this SERVER ticket:

Description: Make killproc more granular when delay is passed. (#428029, <xjakub@fi.muni.cz>)

URL: https://git.fedorahosted.org/cgit/initscripts.git/commit/rc.d/init.d/functions?id=9369bf1568b73061fe29670b4faae80c6507d56f

Comment by Web Systems [ 31/Jan/14 ]

Hi,
I know fixing this issue (1 second proc kill race or wait for 5 minutes) is not particularly urgent, but it's certainly affecting us and can be very confusing and disconcerting the first time you come across it - especially in the heat of production changes. In fact for one of our applications the mongod process almost never dies within a second.
I for one would be grateful if a fix could find it's way into a stable version as soon as possible. We could obviously roll our own custom init.d script - it's not a difficult problem to fix - but we use MongoDB Inc supplied RPMS, so the problem for us is that if a committed fix is not bundled with the RPM and we roll out our own custom version of the init.d script, then every time we upgrade we're going to have to check your init.d. script in the RPM, against our custom version, and try and merge them each time. As you can probably guess that's not an easy procedure to automate so I'm reluctant to go in that direction.
Regards
Michael Allaway

Comment by Alexis Midon [ 23/May/13 ]

Here are some details on the issue. Sorry I should have shared that earlier.

I was getting irritated by `mongo stop` hanging sometimes and I know some of you experienced the same behavior.
So I looked into it.

For the impatients, the short story is that this is not a bug in the init.d script. If the stop command hangs it means the mongo process didn't die within a second, and the script will wait 5 minutes before sending a SIGKILL(-9).

For the curious, `/etc/init.d/mongo stop` actually executes `killproc -p "$PIDFILE" -d 300 /usr/bin/mongod`.
By specifications, killproc first sends a SIGTEM followed by a SIGKILL after an unspecified number of seconds. That number of seconds is configurable with the -d (delay) option and /etc/init.d/mongo set it to 300 seconds or 5mn.
The only issue with this is that the wait is not implemented as a busy loop but with the sleep command. So you have to wait 5 minutes, even if the process dies before.

This is very clear when using `bash –x /etc/init.d/mongo stop` .

Two take-aways from this:
1. If stop hangs, assume mongo is busy trying to shutdown gracefully
2. A small improvement to the script would be to put `kill -TERM` in a loop with a short sleep and a global 5-mm timeout, instead of a 5–mm sleep.

+ '[' -d /proc/31125 ']'
+ return 0
+ kill -TERM 31125
+ usleep 100000               # wait 1/10 second
+ checkpid 31125
+ local i
+ for i in '$*'
+ '[' -d /proc/31125 ']'      # process is still there 
+ return 0
+ sleep 1                     # give it a second
+ checkpid 31125
+ local i
+ for i in '$*'
+ '[' -d /proc/31125 ']'         # process is still there 
+ return 0
+ sleep 300                   # wait 5 minutes (–d value)
+ checkpid 31125
+ local i
+ for i in '$*'
+ '[' -d /proc/31125 ']'      # process is still there 
+ return 0
+ kill -KILL 31125            # die suck'r!
+ usleep 100000
+ checkpid 31125
+ local i
+ for i in '$*'
+ '[' -d /proc/31125 ']'
+ return 0
+ RC=0
+ '[' 0 -eq 0 ']'
+ failure 'mongod shutdown'

Comment by Ernie Hershey [ 23/May/13 ]

Sure. I commented on the pull request.

Comment by Matt Kangas [ 23/May/13 ]

Ernie, can you take a look at this?

Generated at Thu Feb 08 03:19:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.