Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21150

Basic startup logging should be done as early as possible in initAndListen

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.8, 3.2.0-rc2
    • Affects Version/s: 3.0.1, 3.2.0-rc0
    • Component/s: Internal Code
    • None
    • Fully Compatible
    • ALL

      The "MongoDB starting" logline (and related lines) are logged when mongod/mongos starts up, and provide basic information about the environment in use that is invaluable when diagnosing issues:

      $ /m/3.0.0/bin/mongod --port 12345
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] MongoDB starting : pid=29676 port=12345 dbpath=/data/db 64-bit host=genique
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] db version v3.0.0
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] git version: a841fd6394365954886924a35076691b4d149168
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-10-27T10:48:11.823+1100 I CONTROL  [initandlisten] options: { net: { port: 12345 } }
      2015-10-27T10:48:11.848+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
      2015-10-27T10:48:11.848+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
      2015-10-27T10:48:11.981+1100 I JOURNAL  [durability] Durability thread started
      2015-10-27T10:48:11.981+1100 I JOURNAL  [journal writer] Journal writer thread started
      2015-10-27T10:48:11.987+1100 I NETWORK  [initandlisten] waiting for connections on port 12345
      

      However, since 3.0.1, much initialisation work is done ahead of these basic log lines. Specifically, storage engine initialisation (which includes journal replay), and setting up sockets (which is worse as of 3.1.9).

      $ /m/3.0.1/bin/mongod --port 12345
      2015-10-27T10:48:20.026+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
      2015-10-27T10:48:20.027+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
      2015-10-27T10:48:20.160+1100 I JOURNAL  [durability] Durability thread started
      2015-10-27T10:48:20.160+1100 I JOURNAL  [journal writer] Journal writer thread started
      2015-10-27T10:48:20.161+1100 I CONTROL  [initandlisten] MongoDB starting : pid=29829 port=12345 dbpath=/data/db 64-bit host=genique
      2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] db version v3.0.1
      2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] git version: 534b5a3f9d10f00cd27737fbcd951032248b5952
      2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] build info: Linux build16.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-10-27T10:48:20.162+1100 I CONTROL  [initandlisten] options: { net: { port: 12345 } }
      2015-10-27T10:48:20.188+1100 I NETWORK  [initandlisten] waiting for connections on port 12345
      

      While I understand the need to initialise sockets and the storage engine as early as possible, it can be very problematic for these to be done ahead of basic logging of things like version number, build info, and options. This is because there are ample opportunities where the server can log errors and exit before the basic environment information has been logged.

      For example, compare the behaviour when the port is already in use:

      $ /m/3.2.0-rc0/bin/mongod
      2015-10-27T10:41:55.964+1100 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
      2015-10-27T10:41:55.964+1100 E NETWORK  [initandlisten]   addr already in use
      2015-10-27T10:41:55.964+1100 E STORAGE  [initandlisten] Failed to set up sockets during startup.
      2015-10-27T10:41:55.964+1100 I CONTROL  [initandlisten] dbexit:  rc: 48
      $ /m/3.0.0/bin/mongod
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] MongoDB starting : pid=17987 port=27017 dbpath=/data/db 64-bit host=genique
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] db version v3.0.0
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] git version: a841fd6394365954886924a35076691b4d149168
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] build info: Linux build15.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] allocator: tcmalloc
      2015-10-27T11:11:21.686+1100 I CONTROL  [initandlisten] options: {}
      2015-10-27T11:11:21.711+1100 E NETWORK  [initandlisten] listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
      2015-10-27T11:11:21.711+1100 E NETWORK  [initandlisten]   addr already in use
      2015-10-27T11:11:21.718+1100 I JOURNAL  [initandlisten] journal dir=/data/db/journal
      2015-10-27T11:11:21.718+1100 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
      2015-10-27T11:11:21.864+1100 I JOURNAL  [durability] Durability thread started
      2015-10-27T11:11:21.864+1100 I JOURNAL  [journal writer] Journal writer thread started
      2015-10-27T11:11:21.871+1100 I CONTROL  [initandlisten] now exiting
      2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
      2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
      2015-10-27T11:11:21.871+1100 I NETWORK  [initandlisten] shutdown: going to close sockets...
      2015-10-27T11:11:21.872+1100 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
      2015-10-27T11:11:21.872+1100 I STORAGE  [initandlisten] shutdown: final commit...
      2015-10-27T11:11:21.876+1100 I JOURNAL  [initandlisten] journalCleanup...
      2015-10-27T11:11:21.876+1100 I JOURNAL  [initandlisten] removeJournalFiles
      2015-10-27T11:11:21.878+1100 I JOURNAL  [initandlisten] Terminating durability thread ...
      2015-10-27T11:11:21.974+1100 I JOURNAL  [journal writer] Journal writer thread stopped
      2015-10-27T11:11:21.974+1100 I JOURNAL  [durability] Durability thread stopped
      2015-10-27T11:11:21.975+1100 I STORAGE  [initandlisten] shutdown: closing all files...
      2015-10-27T11:11:21.976+1100 I STORAGE  [initandlisten] closeAllFiles() finished
      2015-10-27T11:11:21.976+1100 I STORAGE  [initandlisten] shutdown: removing fs lock...
      2015-10-27T11:11:21.976+1100 I CONTROL  [initandlisten] dbexit:  rc: 48
      

      And the following actual (full) logfiles from the field, which contain no useful context about version, build, etc. This means that support engineers are left hunting for crumbs of information, like the timestamp/severity/component format indicating 3.0.x, and the Hotfix line and backslashes in paths indicating Windows.

      2015-10-20T09:09:31.180-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
      2015-10-20T09:09:31.204-0500 I JOURNAL  [initandlisten] journal dir=db\journal
      2015-10-20T09:09:31.205-0500 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
      2015-10-20T09:09:31.205-0500 I JOURNAL  [initandlisten]
      2015-10-20T09:09:31.206-0500 E JOURNAL  [initandlisten] Insufficient free space for journal files
      2015-10-20T09:09:31.206-0500 I JOURNAL  [initandlisten] Please make at least 3379MB available in db\journal or use --smallfiles
      2015-10-20T09:09:31.206-0500 I JOURNAL  [initandlisten]
      2015-10-20T09:09:31.207-0500 I STORAGE  [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
      2015-10-20T09:09:31.207-0500 I CONTROL  [initandlisten] now exiting
      2015-10-20T09:09:31.207-0500 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
      2015-10-20T09:09:31.207-0500 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
      2015-10-20T09:09:31.208-0500 I NETWORK  [initandlisten] shutdown: going to close sockets...
      2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
      2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: final commit...
      2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] shutdown: closing all files...
      2015-10-20T09:09:31.208-0500 I STORAGE  [initandlisten] closeAllFiles() finished
      2015-10-20T09:09:31.208-0500 I CONTROL  [initandlisten] dbexit:  rc: 100
      
      2015-10-21T08:39:23.982-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
      2015-10-21T08:39:23.982-0500 W -        [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty.
      2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] journal dir=db\journal
      2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover begin
      2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover lsn: 6474133
      2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover db\journal\j._2
      2015-10-21T08:39:23.983-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5634193 < lsn:6474133
      2015-10-21T08:39:24.013-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5694183 < lsn:6474133
      2015-10-21T08:39:24.606-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5754183 < lsn:6474133
      2015-10-21T08:39:25.650-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5814183 < lsn:6474133
      2015-10-21T08:39:26.261-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5874173 < lsn:6474133
      2015-10-21T08:39:26.977-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5934173 < lsn:6474133
      2015-10-21T08:39:27.486-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:5994163 < lsn:6474133
      2015-10-21T08:39:28.106-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:6054163 < lsn:6474133
      2015-10-21T08:39:28.605-0500 I JOURNAL  [initandlisten] recover skipping application of section seq:6114163 < lsn:6474133
      2015-10-21T08:39:29.363-0500 I JOURNAL  [initandlisten] recover skipping application of section more...
      2015-10-21T08:39:29.988-0500 I JOURNAL  [initandlisten] recover db\journal\j._3
      2015-10-21T08:39:42.883-0500 I JOURNAL  [initandlisten] recover cleaning up
      2015-10-21T08:39:42.884-0500 I JOURNAL  [initandlisten] removeJournalFiles
      2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten] recover done
      2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten]
      2015-10-21T08:39:42.921-0500 E JOURNAL  [initandlisten] Insufficient free space for journal files
      2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten] Please make at least 422MB available in db\journal or use --smallfiles
      2015-10-21T08:39:42.921-0500 I JOURNAL  [initandlisten]
      2015-10-21T08:39:42.921-0500 I STORAGE  [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
      2015-10-21T08:39:42.922-0500 I CONTROL  [initandlisten] now exiting
      2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
      2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
      2015-10-21T08:39:42.922-0500 I NETWORK  [initandlisten] shutdown: going to close sockets...
      2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: waiting for fs preallocator...
      2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: final commit...
      2015-10-21T08:39:42.922-0500 I STORAGE  [initandlisten] shutdown: closing all files...
      2015-10-21T08:39:42.923-0500 I STORAGE  [initandlisten] closeAllFiles() finished
      2015-10-21T08:39:42.923-0500 I CONTROL  [initandlisten] dbexit:  rc: 100
      
      2015-10-22T20:24:06.133-0500 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
      2015-10-22T20:24:06.134-0500 W -        [initandlisten] Detected unclean shutdown - db\mongod.lock is not empty.
      2015-10-22T20:24:06.142-0500 I STORAGE  [initandlisten] **************
      old lock file: db\mongod.lock.  probably means unclean shutdown,
      but there are no journal files to recover.
      this is likely human error or filesystem corruption.
      please make sure that your journal directory is mounted.
      found 4 dbs.
      see: http://dochub.mongodb.org/core/repair for more information
      *************
      2015-10-22T20:24:06.147-0500 I STORAGE  [initandlisten] exception in initAndListen: 12596 old lock file, terminating
      2015-10-22T20:24:06.149-0500 I CONTROL  [initandlisten] dbexit:  rc: 100
      

      Logging this very basic information ahead of initialising sockets or storage should not deprive those systems of any significant resources. Is there any reason it should not be done first thing inside _initAndListen? As long as initialising sockets comes immediately after, followed immediately by storage engine init, then this should be fine, right?

            Assignee:
            kevin.pulo@mongodb.com Kevin Pulo
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: