Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11229

Unclean mongod shutdown when slow flush is in progress

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 2.4.3
    • Admin
    • None
    • ALL

    Description

      When a slow flush operation is in progress, terminating MongoD with SIGTERM can result in an unclean shut down:

      Mon Oct 14 23:35:31.453 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
      Mon Oct 14 23:35:31.454 [signalProcessingThread] now exiting
      Mon Oct 14 23:35:31.454 dbexit: 
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: going to close listening sockets...
      Mon Oct 14 23:35:31.454 [signalProcessingThread] closing listening socket: 34
      Mon Oct 14 23:35:31.454 [signalProcessingThread] closing listening socket: 35
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: going to flush diaglog...
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: going to close sockets...
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: waiting for fs preallocator...
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: lock for final commit...
      Mon Oct 14 23:35:31.454 [signalProcessingThread] shutdown: final commit...
      Mon Oct 14 23:35:31.464 [rsBackgroundSync] Socket recv() errno:9 Bad file descriptor IP:27017
      Mon Oct 14 23:35:31.464 [rsBackgroundSync] SocketException: remote: IP:27017 error: 9001 socket exception [1] server [IP:27017] 
      Mon Oct 14 23:35:31.464 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: IP:27017
      Mon Oct 14 23:35:31.850 [conn91747] end connection XXX:53591 (60 connections now open)
      Mon Oct 14 23:35:32.491 [conn91742] end connection XXX:59239 (59 connections now open)
      Mon Oct 14 23:35:32.530 [conn91743] end connection XXX:39650 (58 connections now open)
      Mon Oct 14 23:35:32.716 [conn91744] end connection XXX:36543 (57 connections now open)
      Mon Oct 14 23:35:46.320 [conn91552] end connection XXX:41064 (56 connections now open)
      Mon Oct 14 23:35:46.716 [conn91563] end connection XXX:41163 (55 connections now open)
      Mon Oct 14 23:37:01.239 [signalProcessingThread] shutdown: closing all files...
      Mon Oct 14 23:37:01.239 [DataFileSync] flushing mmaps took 627745ms  for 317 files
      Mon Oct 14 23:37:01.239 [DataFileSync] ERROR: Client::shutdown not called: DataFileSync
       
       
      ***** SERVER RESTARTED *****
       
       
      Mon Oct 14 23:43:32.090 [initandlisten] MongoDB starting : pid=32018 port=27017 dbpath=/db/data 64-bit host=XXX
      Mon Oct 14 23:43:32.091 [initandlisten] db version v2.4.3
      Mon Oct 14 23:43:32.091 [initandlisten] git version: 37efc4cff49790b5080682772a71c747d449a54d
      Mon Oct 14 23:43:32.091 [initandlisten] build info: Linux ip-a-b-c-d 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
      Mon Oct 14 23:43:32.091 [initandlisten] allocator: tcmalloc
      Mon Oct 14 23:43:32.091 [initandlisten] options: { bind_ip: "XXX", config: "/conf/mongodb.cnf", dbpath: "/db/data", directoryperdb: "true", fork: "true", logappend: "true", logpath: "/log/mongodb.log", noauth: "true", nounixsocket: "true", pidfilepath: "/var/mongodb.pid", port: 27017, replSet: "XXX", rest: "true" }
      Mon Oct 14 23:43:32.113 [initandlisten] journal dir=/db/data/journal
      Mon Oct 14 23:43:32.113 [initandlisten] recover begin

      While it is absolutely not normal to have flush times of 600+ seconds, we should be either smarter about waiting for the ongoing flush to complete before shut down, or just print a meaningful error message.

      Attachments

        Activity

          People

            Unassigned Unassigned
            alex.komyagin@mongodb.com Alexander Komyagin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: