Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3152

Segmentation fault after too many open files

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.6.5
    • Component/s: Stability
    • Labels:
      None
    • Environment:
      Ubuntu 10.4 LTS on Amazon EC2 large nodes on mdadm/lvm EBS disks
    • Linux

      The primary node of a replica set ran out of file descriptors, which was logged about:

      root@m2:~# ulimit -n
      1024

      root@m2:~# grep "Too many open files" /var/log/mongodb/mongodb.log | wc -l
      6574844

      but in the end results in a segfault, displayed in the log's tail:

      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
      [6M times]

      Thu May 26 20:27:35 [conn1126823] Uncaught std::exception: boost::filesystem::basic_directory_iterator constructor: Too many open files: "/mnt/mongo/_tmp/esort.1306441653.672251719/", terminating
      Thu May 26 20:27:35 dbexit:

      Thu May 26 20:27:35 [conn1126823] shutdown: going to close listening sockets...
      Thu May 26 20:27:35 [conn1126823] closing listening socket: 18
      Thu May 26 20:27:35 [conn1126823] closing listening socket: 20
      Thu May 26 20:27:35 [conn1126823] shutdown: going to flush oplog...
      Thu May 26 20:27:35 [conn1126823] shutdown: going to close sockets...
      Thu May 26 20:27:35 [conn1126823] shutdown: waiting for fs preallocator...
      Thu May 26 20:27:35 [conn1126823] shutdown: closing all files...
      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
      Thu May 26 20:27:35 [conn8] end connection 127.0.0.1:55336
      Thu May 26 20:27:35 [conn1127914] assertion 11600 interrupted at shutdown ns:synth.current query:{ query: {}, $snapshot: true }
      Thu May 26 20:27:35 [conn1127914] query synth.current exception 1214ms
      Thu May 26 20:27:35 [conn1127914] SocketException in connThread, closing client connection
      Thu May 26 20:27:35 [conn7] end connection 127.0.0.1:55335
      Thu May 26 20:27:35 ERROR: Client::shutdown not called: slaveTracking
      Thu May 26 20:27:35 Got signal: 11 (Segmentation fault).

      Thu May 26 20:27:35 [conn52293] end connection 10.254.238.86:54967
      Thu May 26 20:27:35 Backtrace:
      0x824629 0x7f3429c7eaf0 0x6e1e90 0x6e1efa 0x7247af 0x6c19d7 0x6bcdb2 0x6206cd 0x622b6c 0x705ba4 0x70adf2 0x70b494 0x5588a2 0x7a287c 0x797596 0x798538 0x5fb7e5 0x60029f 0x7074ba 0x70aaf6
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
      /lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
      /usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPKci+0) [0x6e1e90]
      /usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE+0x3a) [0x6e1efa]
      /usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x17f) [0x7247af]
      /usr/bin/mongod() [0x6c19d7]
      /usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x6bcdb2]
      /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1dfd) [0x6206cd]
      /usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x11c) [0x622b6c]
      /usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d4) [0x705ba4]
      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x17d2) [0x70adf2]
      /usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageE+0x64) [0x70b494]
      /usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x2a2) [0x5588a2]
      /usr/bin/mongod(_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x140c) [0x7a287c]
      /usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa16) [0x797596]
      /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x798) [0x798538]
      /usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x5fb7e5]
      /usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1bbf) [0x60029f]
      /usr/bin/mongod() [0x7074ba]

      /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x14d6) [0x70aaf6]

      Thu May 26 20:27:35 dbexit: ; exiting immediately

      Thu May 26 20:27:35 ERROR: Client::~Client _context should be null but is not; client:conn
      Thu May 26 20:27:35 [conn31] end connection 127.0.0.1:55337
      Thu May 26 20:27:36 [conn50043] end connection 10.86.197.56:48503
      Thu May 26 20:27:36 [conn430] end connection 10.212.71.69:40076
      Thu May 26 20:27:37 [conn52518] end connection 10.198.107.95:58198
      50/245 20%
      88/245 35%
      Thu May 26 20:27:39 Got signal: 11 (Segmentation fault).

      Thu May 26 20:27:39 Backtrace:
      0x824629 0x7f3429c7eaf0 0x52c2b5 0x701d57 0x702551 0x827a08 0x83a4b0 0x7f342a7829ca 0x7f3429d3170d
      /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
      /lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
      /usr/bin/mongod(_ZN5mongo9MongoFile13closeAllFilesERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE+0xa5) [0x52c2b5]
      /usr/bin/mongod(_ZN5mongo8shutdownEv+0x3a7) [0x701d57]
      /usr/bin/mongod(_ZN5mongo6dbexitENS_8ExitCodeEPKc+0x201) [0x702551]
      /usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x13f8) [0x827a08]
      /usr/bin/mongod(thread_proxy+0x80) [0x83a4b0]
      /lib/libpthread.so.0(+0x69ca) [0x7f342a7829ca]
      /lib/libc.so.6(clone+0x6d) [0x7f3429d3170d]

      The node was not responding any more, at least from the first message:

      Thu May 26 20:25:06 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files

      to the last:

      Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files

      but very likely already 10 minutes before; yet a fail-over in the replica set occurred only at the time of the segfault.

      Do let me know if you need more information...

            Assignee:
            Unassigned Unassigned
            Reporter:
            skion Pieter Ennes
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: