[SERVER-2719] mongod running as a service on Windows does not restart correctly after a reboot Created: 09/Mar/11  Updated: 08/Feb/23  Resolved: 15/Mar/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.0-rc1
Fix Version/s: 1.8.1, 1.9.0

Type: Bug Priority: Major - P3
Reporter: Robert Stam Assignee: Tad Marshall
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 7 64-bit with SP1 and all Windows Updates as of 2011-03-07.


Attachments: Text File 0001-SERVER-2719-clean-up-the-log-file-when-Windows-shuts.patch     Zip Archive logs.zip     Text File mongo.log    
Issue Links:
Related
is related to SERVER-2833 mongod.exe does not shut down correct... Closed
Operating System: Windows
Participants:

 Description   

Steps to reproduce:

1. mongod --install --logpath c:\data\logs --logappend
2. net start MongoDB
3. at this point mongod is running as a Windows service
4. reboot

Upon reboot mongod repeatedly fails to start. If you watch its status using the Windows Services Administrative Tool you can see the status alternating between Starting and Started.

Each time mongod attempts to start it writes a few lines to the log file, and the attempts to start are happening in such quick succession that the log file is growing at the rate of several MB/minute, so unless action is taken the disk can be filled up.

To break out of the cycle repeatedly run:

net stop MongoDB

The problem is that this command only works when the service is in the Started state, which it is only very briefly before failing. So you have to keep running this command until you get lucky.

The other piece of information is that upon reboot the mongod.lock file is not empty. It contains one line of text with a number in it.

The two log files attached are the log file right after the service was started for the first time, and the first 20K of the ever growing log file after reboot.



 Comments   
Comment by Tom Robinson [ 22/Nov/13 ]

Is this related?
http://stackoverflow.com/questions/20142123/whats-the-safest-way-to-shut-down-mongodb-when-running-as-a-windows-service?noredirect=1#comment30022908_20142123

Comment by Tad Marshall [ 16/Jan/12 ]

@Zeng Jie – you mention version 2.0.2, but your log says 1.8.2. Some of what I see in the log file matches behavior that was fixed for 2.1.0 (see https://jira.mongodb.org/browse/SERVER-2833 ) but some fixes are in earlier versions, including the 2.0.2 you mentioned. The fix for SERVER-2833 makes sure that we initialize the thread that the service controller gives us so that shutdown doesn't generate errors, and logs the call from the service controller so that the log file makes clear how and why shutdown was triggered. It also closes a hole if Windows tries to close our console. I'd like to leave this ticket marked as Closed until 2.1.0 is ready and gets tested.

Comment by Donny V [ 25/Aug/11 ]

I'm still seeing this bug also when you stop the service. Doesn't seem to happen all the time though.

Log file:
------------------------------------------------------
Thu Aug 25 12:15:38 BackgroundJob starting: DataFileSync
Thu Aug 25 12:15:38 [initandlisten] MongoDB starting : pid=3724 port=27017 dbpath=C:\myiDV_Server\MongoDB\data\db 64-bit
Thu Aug 25 12:15:38 [initandlisten] db version v1.8.3, pdfile version 4.5
Thu Aug 25 12:15:38 [initandlisten] git version: c206d77e94bc3b65c76681df5a6b605f68a2de05
Thu Aug 25 12:15:38 [initandlisten] build sys info: windows (6, 1, 7601, 2, 'Service Pack 1') BOOST_LIB_VERSION=1_42
Thu Aug 25 12:15:38 [initandlisten] Accessing: local for the first time
Thu Aug 25 12:15:38 [initandlisten] query local.system.namespaces reslen:36 nreturned:0 0ms
Thu Aug 25 12:15:38 [initandlisten] enter repairDatabases (to check pdfile version #)
Thu Aug 25 12:15:38 [initandlisten] done repairDatabases
Thu Aug 25 12:15:38 BackgroundJob starting: snapshot
Thu Aug 25 12:15:38 [initandlisten] waiting for connections on port 27017
Thu Aug 25 12:15:38 BackgroundJob starting: ClientCursorMonitor
Thu Aug 25 12:15:38 [websvr] web admin interface listening on port 28017
Thu Aug 25 12:15:42 shutdown: going to close listening sockets...
Thu Aug 25 12:15:42 closing listening socket: 300
Thu Aug 25 12:15:42 closing listening socket: 304
Thu Aug 25 12:15:42 shutdown: going to flush diaglog...
Thu Aug 25 12:15:42 shutdown: going to close sockets...
Thu Aug 25 12:15:42 [websvr] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Thu Aug 25 12:15:42 [websvr] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Thu Aug 25 12:15:42 shutdown: waiting for fs preallocator...
Thu Aug 25 12:15:42 shutdown: closing all files...
Thu Aug 25 12:15:42 closeAllFiles() finished
Thu Aug 25 12:15:42 shutdown: removing fs lock...
Thu Aug 25 12:15:42 [initandlisten] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Thu Aug 25 12:15:42 [initandlisten] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Thu Aug 25 12:15:42 [initandlisten] now exiting
Thu Aug 25 12:15:42 dbexit:
Thu Aug 25 12:15:42 [initandlisten] shutdown: going to close listening sockets...
Thu Aug 25 12:15:42 [initandlisten] shutdown: going to flush diaglog...
Thu Aug 25 12:15:42 [initandlisten] shutdown: going to close sockets...
Thu Aug 25 12:15:42 [initandlisten] shutdown: waiting for fs preallocator...
Thu Aug 25 12:15:42 [initandlisten] shutdown: closing all files...
Thu Aug 25 12:15:42 closeAllFiles() finished
Thu Aug 25 12:15:42 [initandlisten] shutdown: removing fs lock...
Thu Aug 25 12:15:42 [initandlisten] couldn't remove fs lock 0
Thu Aug 25 12:15:42 dbexit: really exiting now

Comment by Justin Dearing [ 26/Apr/11 ]

Why does mongo not attempt to remove the lock file? Doesn't mongod hold an exclusive lock on the the lock file?

Comment by Robert Stam [ 26/Apr/11 ]

The fix that was made for this JIRA ticket is limited to ensuring that a clean shutdown of Windows results in a clean shutdown of MongoDB running as a service. An unclean shutdown of Windows or a crash of MongoDB will still leave the lock file in place resulting in this infinite loop on startup.

The infinite loop can be broken either by running "net stop MongoDB" until it takes effect, or by changing the service properties for MongoDB in the Services control panel to not restart the service on failure.

Comment by Paul C [ 26/Apr/11 ]

I'm still fighting the issue. My Mongo version is 1.8.1
Git version: a429cd4f535b2499cc4130b06ff7c26f41c00f04

Comment by auto [ 16/Mar/11 ]

Author:

{u'login': u'rstam', u'name': u'rstam', u'email': u'robert@10gen.com'}

Message: Fixed SERVER-2719, SERVER-1680 and SERVER-1434 (all had the same underlying cause).
https://github.com/mongodb/mongo/commit/e4df10c2e034e9cb25e0f8de13b5c5e9b00283a9

Comment by auto [ 15/Mar/11 ]

Author:

{u'login': u'rstam', u'name': u'rstam', u'email': u'robert@10gen.com'}

Message: Fixed SERVER-2719, SERVER-1680 and SERVER-1434 (all had the same underlying cause).
https://github.com/mongodb/mongo/commit/6ebf0c03153113927cf28dd5151bdd0cc9a2eead

Comment by Testo [ 10/Mar/11 ]

if you try the patch, it should work. It works for me after rebooting

Comment by Eliot Horowitz (Inactive) [ 09/Mar/11 ]

This does sound like its the root cause.
Not sure why its acting differently.

Can you try to fix?

Comment by Chris Westin [ 09/Mar/11 ]

In between other things, I fooled with this a bit and looked at a bit of code.

The service handler is calling db/instance.cpp/shutdownServer(). That function appears to do all the necessaries synchronously, and only after that does the service handler report success to its caller.

However, there are some discrepancies in the apparent code paths that are taken during the shutdown process. I tried shutting down in various ways, including explicitly (via net stop MongoDB), and rebooting (where the service manager should do the same thing). I've captured some log fragments (below) which show the results aren't always the same. The worse offender is the log fragment from rebooting, which seems to indicate the system just got a ctrl-c and ignored it, rather than shutting down. More puzzling is that this is what I see, even though I can't reproduce the problem. I start up just fine. But Robert (and these external cases) don't.

---- closing down foreground mongod process with ^C:
Tue Mar 08 13:25:13 Ctrl-C signal
Tue Mar 08 13:25:13 got kill or ctrl-c signal, will terminate after current cmd ends
Tue Mar 08 13:25:13 [ctrlCTerminate] now exiting
Tue Mar 08 13:25:13 dbexit:
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: going to close listening sockets...
Tue Mar 08 13:25:13 [ctrlCTerminate] closing listening socket: 252
Tue Mar 08 13:25:13 [ctrlCTerminate] closing listening socket: 256
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: going to flush diaglog...
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: going to close sockets...
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: waiting for fs preallocator...
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: lock for final commit...
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: final commit...
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: closing all files...
Tue Mar 08 13:25:13 closeAllFiles() finished
Tue Mar 08 13:25:13 [ctrlCTerminate] journalCleanup...
Tue Mar 08 13:25:13 [ctrlCTerminate] removeJournalFiles
Tue Mar 08 13:25:13 [ctrlCTerminate] shutdown: removing fs lock...
Tue Mar 08 13:25:13 dbexit: really exiting now
Tue Mar 08 13:37:54 dbexit:
Tue Mar 08 13:37:54 shutdown: going to close listening sockets...
Tue Mar 08 13:37:54 shutdown: going to flush diaglog...
Tue Mar 08 13:37:54 shutdown: going to close sockets...
Tue Mar 08 13:37:54 shutdown: waiting for fs preallocator...
Tue Mar 08 13:37:54 shutdown: lock for final commit...
Tue Mar 08 13:37:54 shutdown: final commit...
Tue Mar 08 13:37:54 shutdown: closing all files...
Tue Mar 08 13:37:54 closeAllFiles() finished
Tue Mar 08 13:37:54 dbexit: really exiting now

---- net stop MongoDB
Tue Mar 08 13:40:48 shutdown: going to close listening sockets...
Tue Mar 08 13:40:48 closing listening socket: 304
Tue Mar 08 13:40:48 closing listening socket: 308
Tue Mar 08 13:40:48 shutdown: going to flush diaglog...
Tue Mar 08 13:40:48 shutdown: going to close sockets...
Tue Mar 08 13:40:48 shutdown: waiting for fs preallocator...
Tue Mar 08 13:40:48 [initandlisten] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 13:40:48 shutdown: lock for final commit...
Tue Mar 08 13:40:48 [websvr] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 13:40:48 [initandlisten] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 13:40:48 [initandlisten] now exiting
Tue Mar 08 13:40:48 dbexit:
Tue Mar 08 13:40:48 [initandlisten] shutdown: going to close listening sockets...
Tue Mar 08 13:40:48 [initandlisten] shutdown: going to flush diaglog...
Tue Mar 08 13:40:48 [initandlisten] shutdown: going to close sockets...
Tue Mar 08 13:40:48 [websvr] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 13:40:48 [initandlisten] shutdown: waiting for fs preallocator...
Tue Mar 08 13:40:48 [initandlisten] shutdown: lock for final commit...
Tue Mar 08 13:40:48 [initandlisten] shutdown: final commit...
Tue Mar 08 13:40:48 [initandlisten] shutdown: closing all files...
Tue Mar 08 13:40:48 closeAllFiles() finished
Tue Mar 08 13:40:48 [initandlisten] journalCleanup...
Tue Mar 08 13:40:48 [initandlisten] removeJournalFiles
Tue Mar 08 13:40:48 [initandlisten] shutdown: removing fs lock...
Tue Mar 08 13:40:48 dbexit: really exiting now

accompanied by

C:\Windows\system32>net stop MongoDB
System error 109 has occurred.

The pipe has been ended.

C:\Windows\system32>

---- rebooting machine the machine
....
Tue Mar 08 14:06:03 CTRL_SHUTDOWN_EVENT signal (ignored)
Tue Mar 08 14:06:14 [dur] lsn set 1438310
Tue Mar 08 14:07:35 [initandlisten] MongoDB starting : pid=2008 port=27017 dbpath=c:/cygwin/home/cwestin/sandbox/dbfiles 64-bit
Tue Mar 08 14:07:36 [initandlisten] _DEBUG build (which is slower)
Tue Mar 08 14:07:36 [initandlisten] db version v1.9.0-pre-, pdfile version 4.5
Tue Mar 08 14:07:36 [initandlisten] git version: 93572fc05b83f3635abd1dc2d55aa81
....

---- mongod --remove
Tue Mar 08 16:28:10 shutdown: going to close listening sockets...
Tue Mar 08 16:28:10 closing listening socket: 300
Tue Mar 08 16:28:10 closing listening socket: 304
Tue Mar 08 16:28:10 shutdown: going to flush diaglog...
Tue Mar 08 16:28:10 shutdown: going to close sockets...
Tue Mar 08 16:28:10 [initandlisten] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 16:28:10 shutdown: waiting for fs preallocator...
Tue Mar 08 16:28:10 shutdown: lock for final commit...
Tue Mar 08 16:28:10 shutdown: final commit...
Tue Mar 08 16:28:10 [initandlisten] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 16:28:10 [websvr] Listener: accept() returns -1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 16:28:10 [websvr] select() failure: ret=-1 errno:10038 An operation was attempted on something that is not a socket.
Tue Mar 08 16:28:10 [dur] lsn set 97920
Tue Mar 08 16:28:10 shutdown: closing all files...
Tue Mar 08 16:28:10 closeAllFiles() finished
Tue Mar 08 16:28:10 journalCleanup...
Tue Mar 08 16:28:10 [initandlisten] now exiting
Tue Mar 08 16:28:10 dbexit:
Tue Mar 08 16:28:10 [initandlisten] shutdown: going to close listening sockets...
Tue Mar 08 16:28:10 [initandlisten] shutdown: going to flush diaglog...
Tue Mar 08 16:28:10 [initandlisten] shutdown: going to close sockets...
Tue Mar 08 16:28:10 [initandlisten] shutdown: waiting for fs preallocator...
Tue Mar 08 16:28:10 [initandlisten] shutdown: lock for final commit...
Tue Mar 08 16:28:10 [initandlisten] shutdown: final commit...
Tue Mar 08 16:28:10 [initandlisten] shutdown: closing all files...
Tue Mar 08 16:28:10 closeAllFiles() finished
Tue Mar 08 16:28:10 [initandlisten] journalCleanup...
Tue Mar 08 16:28:10 removeJournalFiles
Tue Mar 08 16:28:10 shutdown: removing fs lock...
Tue Mar 08 16:28:10 [initandlisten] removeJournalFiles
Tue Mar 08 16:28:10 [initandlisten] shutdown: removing fs lock...
Tue Mar 08 16:28:10 [initandlisten] couldn't remove fs lock 0
Tue Mar 08 16:28:10 dbexit: really exiting now

and actually removes the lockfile, doesn't just truncate it as other things do


Comment by Chris Westin [ 09/Mar/11 ]

There's also this thread: http://thread.gmane.org/gmane.comp.db.mongodb.user/26312

Comment by Justin Dearing [ 09/Mar/11 ]

Sorry I see it

Comment by Justin Dearing [ 09/Mar/11 ]

Testo, did you mean to say you added a patch? If so where can I see it?

Comment by Testo [ 09/Mar/11 ]

add patch

Generated at Thu Feb 08 03:01:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.