Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3152

Segmentation fault after too many open files

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major - P3 Major - P3
    • None
    • 1.6.5
    • Stability
    • None
    • Ubuntu 10.4 LTS on Amazon EC2 large nodes on mdadm/lvm EBS disks
    • Linux

    Description

      The primary node of a replica set ran out of file descriptors, which was logged about:

      root@m2:~# ulimit -n
      1024

      root@m2:~# grep "Too many open files" /var/log/mongodb/mongodb.log | wc -l
      6574844

      but in the end results in a segfault, displayed in the log's tail:

      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
      [6M times]

      Thu May 26 20:27:35 [conn1126823] Uncaught std::exception: boost::filesystem::basic_directory_iterator constructor: Too many open files: "/mnt/mongo/_tmp/esort.1306441653.672251719/", terminating
      Thu May 26 20:27:35 dbexit:

      Thu May 26 20:27:35 [conn1126823] shutdown: going to close listening sockets...
      Thu May 26 20:27:35 [conn1126823] closing listening socket: 18
      Thu May 26 20:27:35 [conn1126823] closing listening socket: 20
      Thu May 26 20:27:35 [conn1126823] shutdown: going to flush oplog...
      Thu May 26 20:27:35 [conn1126823] shutdown: going to close sockets...
      Thu May 26 20:27:35 [conn1126823] shutdown: waiting for fs preallocator...
      Thu May 26 20:27:35 [conn1126823] shutdown: closing all files...
      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
      Thu May 26 20:27:35 [conn8] end connection 127.0.0.1:55336
      Thu May 26 20:27:35 [conn1127914] assertion 11600 interrupted at shutdown ns:synth.current query:{ query: {}, $snapshot: true }
      Thu May 26 20:27:35 [conn1127914] query synth.current exception 1214ms
      Thu May 26 20:27:35 [conn1127914] SocketException in connThread, closing client connection
      Thu May 26 20:27:35 [conn7] end connection 127.0.0.1:55335
      Thu May 26 20:27:35 ERROR: Client::shutdown not called: slaveTracking
      Thu May 26 20:27:35 Got signal: 11 (Segmentation fault).

      Thu May 26 20:27:35 [conn52293] end connection 10.254.238.86:54967
      Thu May 26 20:27:35 Backtrace:
      0x824629 0x7f3429c7eaf0 0x6e1e90 0x6e1efa 0x7247af 0x6c19d7 0x6bcdb2 0x6206cd 0x622b6c 0x705ba4 0x70adf2 0x70b494 0x5588a2 0x7a287c 0x797596 0x798538 0x5fb7e5 0x60029f 0x7074ba 0x70aaf6
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
      /lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
      /usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPKci+0) [0x6e1e90]
      /usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE+0x3a) [0x6e1efa]
      /usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x17f) [0x7247af]
      /usr/bin/mongod() [0x6c19d7]
      /usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x6bcdb2]
      /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1dfd) [0x6206cd]
      /usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x11c) [0x622b6c]
      /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d4) [0x705ba4]
      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x17d2) [0x70adf2]
      /usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageE+0x64) [0x70b494]
      /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x2a2) [0x5588a2]
      /usr/bin/mongod(_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x140c) [0x7a287c]
      /usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa16) [0x797596]
      /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x798) [0x798538]
      /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x5fb7e5]
      /usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1bbf) [0x60029f]
      /usr/bin/mongod() [0x7074ba]

      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x14d6) [0x70aaf6]

      Thu May 26 20:27:35 dbexit: ; exiting immediately

      Thu May 26 20:27:35 ERROR: Client::~Client _context should be null but is not; client:conn
      Thu May 26 20:27:35 [conn31] end connection 127.0.0.1:55337
      Thu May 26 20:27:36 [conn50043] end connection 10.86.197.56:48503
      Thu May 26 20:27:36 [conn430] end connection 10.212.71.69:40076
      Thu May 26 20:27:37 [conn52518] end connection 10.198.107.95:58198
      50/245 20%
      88/245 35%
      Thu May 26 20:27:39 Got signal: 11 (Segmentation fault).

      Thu May 26 20:27:39 Backtrace:
      0x824629 0x7f3429c7eaf0 0x52c2b5 0x701d57 0x702551 0x827a08 0x83a4b0 0x7f342a7829ca 0x7f3429d3170d
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
      /lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
      /usr/bin/mongod(_ZN5mongo9MongoFile13closeAllFilesERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE+0xa5) [0x52c2b5]
      /usr/bin/mongod(_ZN5mongo8shutdownEv+0x3a7) [0x701d57]
      /usr/bin/mongod(_ZN5mongo6dbexitENS_8ExitCodeEPKc+0x201) [0x702551]
      /usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x13f8) [0x827a08]
      /usr/bin/mongod(thread_proxy+0x80) [0x83a4b0]
      /lib/libpthread.so.0(+0x69ca) [0x7f342a7829ca]
      /lib/libc.so.6(clone+0x6d) [0x7f3429d3170d]

      The node was not responding any more, at least from the first message:

      Thu May 26 20:25:06 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files

      to the last:

      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files

      but very likely already 10 minutes before; yet a fail-over in the replica set occurred only at the time of the segfault.

      Do let me know if you need more information...

      Attachments

        Activity

          People

            Unassigned Unassigned
            skion Pieter Ennes
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: