[SERVER-29045] Service stop fails and returns OK Created: 03/May/17  Updated: 12/Jul/17  Resolved: 09/Jun/17

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.2.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Sometimes when the mongod service is stopped it appears to work but returns a failed code (Return code is1)

Stopping mongod: [FAILED]
[ OK ]

Here are the logs

2017-05-03T03:56:02.712+0000 I NETWORK  [initandlisten] connection accepted from 10.201.165.231:59704 #146519 (22 connections now open)
2017-05-03T03:56:02.727+0000 I ACCESS   [conn146519] Successfully authenticated as principal admin on admin
2017-05-03T03:56:02.782+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:46526 #146520 (23 connections now open)
2017-05-03T03:56:02.798+0000 I ACCESS   [conn146520] Successfully authenticated as principal admin on admin
2017-05-03T03:56:02.802+0000 I NETWORK  [conn146520] end connection 127.0.0.1:46526 (22 connections now open)
2017-05-03T03:56:02.805+0000 I NETWORK  [conn146519] end connection 10.201.165.231:59704 (21 connections
2017-05-03T03:56:03.247+0000 I ACCESS   [conn132770] Successfully authenticated as principal admin on admin
2017-05-03T03:56:08.536+0000 I SHARDING [LockPinger] cluster SG-xxxxxxx-3470.xxxxxxx.com:27019,SG-xxxxxxx-3471.xxxxxxx.com:27019,SG-xxxxxxx
s-3472.xxxxxxx.com:27019 pinged successfully at 2017-05-03T03:56:07.097+0000 by distributed lock pinger 'SG-xxxxxxx-3470.xxxxxxx.com:27019,SG-xxxxx
Access-3471.xxxxxxx.com:27019,SG-xxxxxxx-3472.xxxxxxx.com:27019/ip-10-201-165-231:27017:1488065966:312939207', sleeping for 30000ms
2017-05-03T03:56:11.248+0000 I ACCESS   [conn132770] Successfully authenticated as principal admin on admin
2017-05-03T03:56:17.141+0000 I ACCESS   [conn132687] Successfully authenticated as principal admin on admin
2017-05-03T03:56:18.590+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2017-05-03T03:56:18.641+0000 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-05-03T03:56:18.643+0000 W EXECUTOR [rsBackgroundSync] killCursors command task failed: CallbackCanceled: Callback canceled
2017-05-03T03:56:18.741+0000 I STORAGE  [conn132687] got request after shutdown()
2017-05-03T03:56:18.810+0000 I STORAGE  [conn132688] got request after shutdown()
2017-05-03T03:56:18.821+0000 I STORAGE  [conn132770] got request after shutdown()
2017-05-03T03:56:18.881+0000 I STORAGE  [conn139695] got request after shutdown()
2017-05-03T03:56:18.935+0000 I NETWORK  [conn132686] end connection 54.160.190.235:41800 (15 connections now open)
2017-05-03T03:56:19.127+0000 I STORAGE  [conn145601] got request after shutdown()
2017-05-03T03:56:19.221+0000 I REPL     [signalProcessingThread] Stopping replication applier threads
2017-05-03T03:56:19.252+0000 I STORAGE  [conn146517] got request after shutdown()
2017-05-03T03:56:20.127+0000 I STORAGE  [conn129461] got request after shutdown()
2017-05-03T03:56:20.226+0000 I STORAGE  [conn145584] got request after shutdown()
2017-05-03T03:56:21.154+0000 W SHARDING [LockPinger] removing distributed lock ping thread 'SG-xxxxxxx-3470.xxxxxxx.com:27019,SG-xxxxxxx-3471.servers.mongodi
rector.com:27019,SG-xxxxxxx-3472.xxxxxxx.com:27019/ip-10-201-165-231:27017:1488065966:312939207'
2017-05-03T03:56:21.168+0000 W SHARDING [LockPinger] Error encountered while stopping ping on ip-10-201-165-231:27017:1488065966:312939207 :: caused by :: 17382 Can't use connection
 pool during shutdown
2017-05-03T03:56:21.185+0000 I CONTROL  [signalProcessingThread] now exiting
2017-05-03T03:56:21.185+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2017-05-03T03:56:21.185+0000 I NETWORK  [signalProcessingThread] closing listening socket: 6
2017-05-03T03:56:21.185+0000 I NETWORK  [signalProcessingThread] closing listening socket: 7
2017-05-03T03:56:21.198+0000 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2017-05-03T03:56:21.198+0000 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
2017-05-03T03:56:21.198+0000 I NETWORK  [signalProcessingThread] shutdown: going to close sockets...
2017-05-03T03:56:21.198+0000 I NETWORK  [conn146511] end connection 10.255.131.241:39330 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn146498] end connection 50.16.94.100:40564 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn30573] end connection 10.192.9.74:55012 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn146499] end connection 10.251.11.136:60010 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn132682] end connection 23.22.19.102:34360 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn146516] end connection 54.220.226.176:60632 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn133550] end connection 54.155.99.20:37740 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn133447] end connection 54.145.251.204:52632 (10 connections now open)
2017-05-03T03:56:21.198+0000 I NETWORK  [conn132685] end connection 54.92.136.187:48954 (10 connections now open)
2017-05-03T03:56:21.208+0000 I NETWORK  [conn5416] end connection 10.201.165.231:50956 (1 connection now open)
2017-05-03T03:56:21.208+0000 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
2017-05-03T03:56:21.243+0000 I NETWORK  [conn74859] end connection 184.73.125.171:46248 (0 connections now open)



 Comments   
Comment by Mark Agarunov [ 09/Jun/17 ]

Hello dharshanr@scalegrid.net,

The output you are seeing is most likely due to the service command being a shim to systemd's systemctl command. service is returning OK even though the underlying systemctl command is returning FAILED. The failure you are seeing is likely coming from systemd / systemctl, the init system being used.

Thanks,
Mark

Comment by Dharshan Rangegowda [ 24/May/17 ]

Hi Mark,

The command being issued is "service mongod stop". So this command should either fail with [FAILED] or succeed with [OK]. Currently it is failing printing both.

How is this a system issue? This looks like a bug in the implementation of the above command - hence it is filed here.

Comment by Mark Agarunov [ 24/May/17 ]

Hello dharshanr@scalegrid.net,

What is the exact command being issued when you are seeing this? If you are executing systemctl restart mongodb the [FAILED] is from stopping the service and the [OK] is from starting it. Additionally, this appears to be an issue with systemd, not MongoDB itself. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Thanks,
Mark

Comment by Dharshan Rangegowda [ 23/May/17 ]

Hi Mark,

If mongod failed to stop in time shouldn't the operation return a failure? In this case we are also seeing [OK]. My expectation is that if it failed to stop I would only see a [FAILED] result.

Comment by Mark Agarunov [ 19/May/17 ]

Hello dharshanr@scalegrid.net,

Looking over the output you've provided, I believe this may be due to mongod taking longer to stop than the TimeoutStopSec value of systemd. Please try adjusting this value in the systemd unit file to see if the issue persists.

Thanks,
Mark

Comment by Dharshan Rangegowda [ 04/May/17 ]

1. The OS is amazon linux
2. Installed using the rpm
3. Mongodb service files have not been altered in any way.

Comment by Mark Agarunov [ 04/May/17 ]

Hello dharshanr@scalegrid.net,

Thank you for the report. From the output you've provided and the behavior you describe, I suspect the issue is due to systemd timing out before MongoDB has fully stopped. To better diagnose this behavior, I'd like to clarify a couple things:

  • Which operating system/distribution/version is this machine using?
  • How was MongoDB installed on this system? (tarball, package manager, etc)
  • Has the systemd unit file (mongod.service) been modified in any way?

Thanks,
Mark

Generated at Thu Feb 08 04:19:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.