[SERVER-17820] Windows service stop can lead to mongod abrupt termination due to long shutdown time Created: 31/Mar/15  Updated: 18/Sep/15  Resolved: 03/Apr/15

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: 3.0.3, 3.1.1

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-17818 Abrupt termination due to service shu... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of Apr 29, 2015

ISSUE SUMMARY
On Windows, and when MongoDB is run as a service, if MongoDB takes more than 60 seconds to stop while responding to a service stop request, Windows will forcibly terminate the process, leading to a possibly unclean termination. This change will allow MongoDB to take more than 60 seconds for a graceful stop.

WORKAROUNDS
Upgrade to 3.0.3.

AFFECTED VERSIONS
Versions of MongoDB before 3.0.3.

FIX VERSION
The fix is included in the 3.0.3 production release.

Original description

2015-03-23T20:41:02.102Z I CONTROL  [serviceShutdown] got SERVICE_CONTROL_STOP request from Windows Service Control Manager, will terminate after current cmd ends
2015-03-23T20:41:02.106Z I STORAGE  [conn398] got request after shutdown()
...
2015-03-23T20:41:59.157Z I STORAGE  [conn223] got request after shutdown()
2015-03-23T20:42:12.065Z I CONTROL  ***** SERVER RESTARTED *****
2015-03-23T20:42:12.071Z I CONTROL  Trying to start Windows service 'MongoDB'
2015-03-23T20:42:12.071Z I STORAGE  Service running
2015-03-23T20:42:12.072Z W -        [initandlisten] Detected unclean shutdown - D:\Mongo\data\db\mongod.lock is not empty.
2015-03-23T20:42:12.072Z W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-03-23T20:42:12.072Z I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=63G,session_max=20000,eviction=(threads_max=4),statistics=(fast),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-03-23T20:42:12.094Z I -        [initandlisten] Fatal assertion 28548 NoSuchKey Unable to find metadata for table:survey/collection-9-5755622992935745294
2015-03-23T20:42:12.094Z I -        [initandlisten] 
 
***aborting after fassert() failure

  • mongod received SERVICE_CONTROL_STOP request
  • it takes a while to shut down (maybe doing a lengthy checkpoint?)
  • Windows gets impatient after 60 seconds and abruptly terminates mongod
  • recovery fails on subsequent startup

Recovery should succeed (SERVER-17818 tracks that issue) but it would seem desirable to avoid the abrupt termination as well. I believe there is a way to tell Windows that you got the message and are still in the process of shutting down.



 Comments   
Comment by Githook User [ 20/Apr/15 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-17820: Handle long running exitCleanly in Service Stop
Branch: v3.0
https://github.com/mongodb/mongo/commit/8c1851771a772ee69d5aa1b7e43375c5a951cb92

Comment by Githook User [ 20/Apr/15 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-17820: Handle long running exitCleanly in Service Stop

(cherry picked from commit 7060c72b30a836b3052f7890ea8c4b592014adf4)
Branch: v3.0
https://github.com/mongodb/mongo/commit/209e69a909dcb6d321e56688c022570072f9927e

Comment by Githook User [ 03/Apr/15 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-17820: Handle long running exitCleanly in Service Stop
Branch: master
https://github.com/mongodb/mongo/commit/7060c72b30a836b3052f7890ea8c4b592014adf4

Comment by Daniel Pasette (Inactive) [ 31/Mar/15 ]

I'm adding to "3.1 Required" It can be quite common for the server to take a while to shutdown cleanly.

Comment by Mark Benvenuto [ 31/Mar/15 ]

For normal service stop(SERVICE_CONTROL_STOP), I think it can wait an almost indefinite time as low as it continues to report progress correctly (ie, not hung). It can be almost indefinitely. For SERVICE_CONTROL_SHUTDOWN, it can wait a max of 5 seconds on my Windows 8.1 machine, and I think 12 seconds on Windows Server 2008 R2.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms685149%28v=vs.85%29.aspx

We need to call exitCleanly on a separate thread in order to fix this.

Generated at Thu Feb 08 03:45:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.