[SERVER-21150] Basic startup logging should be done as early as possible in initAndListen Created: 27/Oct/15  Updated: 10/Dec/15  Resolved: 30/Oct/15

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.0.1, 3.2.0-rc0
Fix Version/s: 3.0.8, 3.2.0-rc2

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kevin Pulo
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

The "MongoDB starting" logline (and related lines) are logged when mongod/mongos starts up, and provide basic information about the environment in use that is invaluable when diagnosing issues:

$ /m/3.0.0/bin/mongod --port 12345
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] MongoDB starting : pid=29676 port=12345 dbpath=/data/db 64-bit host=genique
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] db version v3.0.0
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] git version: a841fd6394365954886924a35076691b4d149168
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] allocator: tcmalloc
2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] options: { net: { port: 12345 } }
2015-10-27T10:48:11.848+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
2015-10-27T10:48:11.848+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2015-10-27T10:48:11.981+1100 I JOURNAL  [durability] Durability thread started
2015-10-27T10:48:11.981+1100 I JOURNAL  [journal writer] Journal writer thread started
2015-10-27T10:48:11.987+1100 I NETWORK  [initandlisten] waiting for connections on port 12345

However, since 3.0.1, much initialisation work is done ahead of these basic log lines. Specifically, storage engine initialisation (which includes journal replay), and setting up sockets (which is worse as of 3.1.9).

$ /m/3.0.1/bin/mongod --port 12345
2015-10-27T10:48:20.026+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
2015-10-27T10:48:20.027+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2015-10-27T10:48:20.160+1100 I JOURNAL  [durability] Durability thread started
2015-10-27T10:48:20.160+1100 I JOURNAL  [journal writer] Journal writer thread started
2015-10-27T10:48:20.161+1100 I CONTROL  [initandlisten] MongoDB starting : pid=29829 port=12345 dbpath=/data/db 64-bit host=genique
2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] db version v3.0.1
2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] git version: 534b5a3f9d10f00cd27737fbcd951032248b5952
2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] build info: Linux build16.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] allocator: tcmalloc
2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] options: { net: { port: 12345 } }
2015-10-27T10:48:20.188+1100 I NETWORK  [initandlisten] waiting for connections on port 12345

While I understand the need to initialise sockets and the storage engine as early as possible, it can be very problematic for these to be done ahead of basic logging of things like version number, build info, and options. This is because there are ample opportunities where the server can log errors and exit before the basic environment information has been logged.

For example, compare the behaviour when the port is already in use:

$ /m/3.2.0-rc0/bin/mongod
2015-10-27T10:41:55.964+1100 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
2015-10-27T10:41:55.964+1100 E NETWORK  [initandlisten]   addr already in use
2015-10-27T10:41:55.964+1100 E STORAGE  [initandlisten] Failed to set up sockets during startup.
2015-10-27T10:41:55.964+1100 I CONTROL  [initandlisten] dbexit:  rc: 48
$ /m/3.0.0/bin/mongod
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] MongoDB starting : pid=17987 port=27017 dbpath=/data/db 64-bit host=genique
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] db version v3.0.0
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] git version: a841fd6394365954886924a35076691b4d149168
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] allocator: tcmalloc
2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] options: {}
2015-10-27T11:11:21.711+1100 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
2015-10-27T11:11:21.711+1100 E NETWORK  [initandlisten]   addr already in use
2015-10-27T11:11:21.718+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
2015-10-27T11:11:21.718+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2015-10-27T11:11:21.864+1100 I JOURNAL  [durability] Durability thread started
2015-10-27T11:11:21.864+1100 I JOURNAL  [journal writer] Journal writer thread started
2015-10-27T11:11:21.871+1100 I CONTROL  [initandlisten] now exiting
2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to close sockets...
2015-10-27T11:11:21.872+1100 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
2015-10-27T11:11:21.872+1100 I STORAGE  [initandlisten] shutdown: final commit...
2015-10-27T11:11:21.876+1100 I JOURNAL  [initandlisten] journalCleanup...
2015-10-27T11:11:21.876+1100 I JOURNAL  [initandlisten] removeJournalFiles
2015-10-27T11:11:21.878+1100 I JOURNAL  [initandlisten] Terminating durability thread ...
2015-10-27T11:11:21.974+1100 I JOURNAL  [journal writer] Journal writer thread stopped
2015-10-27T11:11:21.974+1100 I JOURNAL  [durability] Durability thread stopped
2015-10-27T11:11:21.975+1100 I STORAGE  [initandlisten] shutdown: closing all files...
2015-10-27T11:11:21.976+1100 I STORAGE  [initandlisten] closeAllFiles() finished
2015-10-27T11:11:21.976+1100 I STORAGE  [initandlisten] shutdown: removing fs lock...
2015-10-27T11:11:21.976+1100 I CONTROL  [initandlisten] dbexit:  rc: 48

And the following actual (full) logfiles from the field, which contain no useful context about version, build, etc. This means that support engineers are left hunting for crumbs of information, like the timestamp/severity/component format indicating 3.0.x, and the Hotfix line and backslashes in paths indicating Windows.

2015-10-20T09:09:31.180-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
2015-10-20T09:09:31.204-0500 I JOURNAL  [initandlisten] journal dir=db\journal
2015-10-20T09:09:31.205-0500 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
2015-10-20T09:09:31.205-0500 I JOURNAL  [initandlisten]
2015-10-20T09:09:31.206-0500 E JOURNAL  [initandlisten] Insufficient free space for journal files
2015-10-20T09:09:31.206-0500 I JOURNAL  [initandlisten] Please make at least 3379MB available in db\journal or use --smallfiles
2015-10-20T09:09:31.206-0500 I JOURNAL  [initandlisten]
2015-10-20T09:09:31.207-0500 I STORAGE  [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
2015-10-20T09:09:31.207-0500 I CONTROL  [initandlisten] now exiting
2015-10-20T09:09:31.207-0500 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2015-10-20T09:09:31.207-0500 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2015-10-20T09:09:31.208-0500 I NETWORK  [initandlisten] shutdown: going to close sockets...
2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: final commit...
2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: closing all files...
2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] closeAllFiles() finished
2015-10-20T09:09:31.208-0500 I CONTROL  [initandlisten] dbexit:  rc: 100

2015-10-21T08:39:23.982-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
2015-10-21T08:39:23.982-0500 W -        [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty.
2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] journal dir=db\journal
2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover begin
2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover lsn: 6474133
2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover db\journal\j._2
2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5634193 < lsn:6474133
2015-10-21T08:39:24.013-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5694183 < lsn:6474133
2015-10-21T08:39:24.606-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5754183 < lsn:6474133
2015-10-21T08:39:25.650-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5814183 < lsn:6474133
2015-10-21T08:39:26.261-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5874173 < lsn:6474133
2015-10-21T08:39:26.977-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5934173 < lsn:6474133
2015-10-21T08:39:27.486-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5994163 < lsn:6474133
2015-10-21T08:39:28.106-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:6054163 < lsn:6474133
2015-10-21T08:39:28.605-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:6114163 < lsn:6474133
2015-10-21T08:39:29.363-0500 I JOURNAL  [initandlisten] recover skipping application of section more...
2015-10-21T08:39:29.988-0500 I JOURNAL  [initandlisten] recover db\journal\j._3
2015-10-21T08:39:42.883-0500 I JOURNAL  [initandlisten] recover cleaning up
2015-10-21T08:39:42.884-0500 I JOURNAL  [initandlisten] removeJournalFiles
2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten] recover done
2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten]
2015-10-21T08:39:42.921-0500 E JOURNAL  [initandlisten] Insufficient free space for journal files
2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten] Please make at least 422MB available in db\journal or use --smallfiles
2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten]
2015-10-21T08:39:42.921-0500 I STORAGE  [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
2015-10-21T08:39:42.922-0500 I CONTROL  [initandlisten] now exiting
2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to close sockets...
2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: final commit...
2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: closing all files...
2015-10-21T08:39:42.923-0500 I STORAGE  [initandlisten] closeAllFiles() finished
2015-10-21T08:39:42.923-0500 I CONTROL  [initandlisten] dbexit:  rc: 100

2015-10-22T20:24:06.133-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
2015-10-22T20:24:06.134-0500 W -        [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty.
2015-10-22T20:24:06.142-0500 I STORAGE  [initandlisten] **************
old lock file: db\mongod.lock.  probably means unclean shutdown,
but there are no journal files to recover.
this is likely human error or filesystem corruption.
please make sure that your journal directory is mounted.
found 4 dbs.
see: http://dochub.mongodb.org/core/repair for more information
*************
2015-10-22T20:24:06.147-0500 I STORAGE  [initandlisten] exception in initAndListen: 12596 old lock file, terminating
2015-10-22T20:24:06.149-0500 I CONTROL  [initandlisten] dbexit:  rc: 100

Logging this very basic information ahead of initialising sockets or storage should not deprive those systems of any significant resources. Is there any reason it should not be done first thing inside _initAndListen? As long as initialising sockets comes immediately after, followed immediately by storage engine init, then this should be fine, right?



 Comments   
Comment by Githook User [ 03/Nov/15 ]

Author:

{u'username': u'devkev', u'name': u'Kevin Pulo', u'email': u'kevin.pulo@mongodb.com'}

Message: SERVER-21150 move basic init log messages earlier

Specifically, ahead of socket and storage engine initialisation
Branch: v3.0
https://github.com/mongodb/mongo/commit/12afbfc77916cf720316435cd9396f5c293db6c2

Comment by Githook User [ 30/Oct/15 ]

Author:

{u'username': u'devkev', u'name': u'Kevin Pulo', u'email': u'kevin.pulo@mongodb.com'}

Message: SERVER-21150 move basic init log messages earlier

Specifically, ahead of socket and storage engine initialisation
Branch: master
https://github.com/mongodb/mongo/commit/a57f58a7d2a00a1b737978dae7a2f1485df4636c

Generated at Thu Feb 08 03:56:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.