-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 1.6.5
-
Component/s: Stability
-
Labels:None
-
Environment:Ubuntu 10.4 LTS on Amazon EC2 large nodes on mdadm/lvm EBS disks
-
Linux
The primary node of a replica set ran out of file descriptors, which was logged about:
root@m2:~# ulimit -n
1024
root@m2:~# grep "Too many open files" /var/log/mongodb/mongodb.log | wc -l
6574844
but in the end results in a segfault, displayed in the log's tail:
Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
[6M times]
Thu May 26 20:27:35 [conn1126823] Uncaught std::exception: boost::filesystem::basic_directory_iterator constructor: Too many open files: "/mnt/mongo/_tmp/esort.1306441653.672251719/", terminating
Thu May 26 20:27:35 dbexit:
Thu May 26 20:27:35 [conn1126823] shutdown: going to close listening sockets...
Thu May 26 20:27:35 [conn1126823] closing listening socket: 18
Thu May 26 20:27:35 [conn1126823] closing listening socket: 20
Thu May 26 20:27:35 [conn1126823] shutdown: going to flush oplog...
Thu May 26 20:27:35 [conn1126823] shutdown: going to close sockets...
Thu May 26 20:27:35 [conn1126823] shutdown: waiting for fs preallocator...
Thu May 26 20:27:35 [conn1126823] shutdown: closing all files...
Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
Thu May 26 20:27:35 [conn8] end connection 127.0.0.1:55336
Thu May 26 20:27:35 [conn1127914] assertion 11600 interrupted at shutdown ns:synth.current query:{ query: {}, $snapshot: true }
Thu May 26 20:27:35 [conn1127914] query synth.current exception 1214ms
Thu May 26 20:27:35 [conn1127914] SocketException in connThread, closing client connection
Thu May 26 20:27:35 [conn7] end connection 127.0.0.1:55335
Thu May 26 20:27:35 ERROR: Client::shutdown not called: slaveTracking
Thu May 26 20:27:35 Got signal: 11 (Segmentation fault).
Thu May 26 20:27:35 [conn52293] end connection 10.254.238.86:54967
Thu May 26 20:27:35 Backtrace:
0x824629 0x7f3429c7eaf0 0x6e1e90 0x6e1efa 0x7247af 0x6c19d7 0x6bcdb2 0x6206cd 0x622b6c 0x705ba4 0x70adf2 0x70b494 0x5588a2 0x7a287c 0x797596 0x798538 0x5fb7e5 0x60029f 0x7074ba 0x70aaf6
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
/lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPKci+0) [0x6e1e90]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKciRNS_7DiskLocE+0x3a) [0x6e1efa]
/usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x17f) [0x7247af]
/usr/bin/mongod() [0x6c19d7]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x6bcdb2]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1dfd) [0x6206cd]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x11c) [0x622b6c]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d4) [0x705ba4]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x17d2) [0x70adf2]
/usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageE+0x64) [0x70b494]
/usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x2a2) [0x5588a2]
/usr/bin/mongod(_ZN5mongo16CmdFindAndModify3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x140c) [0x7a287c]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa16) [0x797596]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x798) [0x798538]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x5fb7e5]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1bbf) [0x60029f]
/usr/bin/mongod() [0x7074ba]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x14d6) [0x70aaf6]
Thu May 26 20:27:35 dbexit: ; exiting immediately
Thu May 26 20:27:35 ERROR: Client::~Client _context should be null but is not; client:conn
Thu May 26 20:27:35 [conn31] end connection 127.0.0.1:55337
Thu May 26 20:27:36 [conn50043] end connection 10.86.197.56:48503
Thu May 26 20:27:36 [conn430] end connection 10.212.71.69:40076
Thu May 26 20:27:37 [conn52518] end connection 10.198.107.95:58198
50/245 20%
88/245 35%
Thu May 26 20:27:39 Got signal: 11 (Segmentation fault).
Thu May 26 20:27:39 Backtrace:
0x824629 0x7f3429c7eaf0 0x52c2b5 0x701d57 0x702551 0x827a08 0x83a4b0 0x7f342a7829ca 0x7f3429d3170d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x824629]
/lib/libc.so.6(+0x33af0) [0x7f3429c7eaf0]
/usr/bin/mongod(_ZN5mongo9MongoFile13closeAllFilesERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE+0xa5) [0x52c2b5]
/usr/bin/mongod(_ZN5mongo8shutdownEv+0x3a7) [0x701d57]
/usr/bin/mongod(_ZN5mongo6dbexitENS_8ExitCodeEPKc+0x201) [0x702551]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x13f8) [0x827a08]
/usr/bin/mongod(thread_proxy+0x80) [0x83a4b0]
/lib/libpthread.so.0(+0x69ca) [0x7f342a7829ca]
/lib/libc.so.6(clone+0x6d) [0x7f3429d3170d]
The node was not responding any more, at least from the first message:
Thu May 26 20:25:06 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
to the last:
Thu May 26 20:27:35 [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
but very likely already 10 minutes before; yet a fail-over in the replica set occurred only at the time of the segfault.
Do let me know if you need more information...