[SERVER-28933] Mongod terminates when out of file descriptors Created: 24/Apr/17  Updated: 27/Oct/23  Resolved: 27/Apr/17

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 3.2.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tuomas Silen Assignee: Mark Agarunov
Resolution: Works as Designed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-28974 Mongos leak connections to mongods Closed
Operating System: ALL
Steps To Reproduce:

The problem occurred multiple times within short duration, so it should be possible to reproduce simply by filling all connections on a busy mongod.

Participants:

 Description   

Too many connections consume all file descriptors (or some other resource?) which causes mongod to crash:

Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [initandlisten] failed to create thread after accepting new connection, closing connection
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [NetworkInterfaceASIO-BGSync-0] terminate() called. An exception is active; attempting to gather more information
Apr 23 17:07:23 mongo5 mongod.27017[14595]: [NetworkInterfaceASIO-BGSync-0] std::exception::what(): Resource temporarily unavailable
Actual exception type: std::system_error
 
 0x1315b12 0x1315432 0x1b143c6 0x1b143f3 0x12a5f7a 0x12a64d8 0x10d86ae 0x10d8f8e 0x10d96d6 0x10c100d 0x10c1cca 0x10c22c8 0x10bf9b0 0x1093aa0 0x10a282c 0x10a2ce8 0x13334e1 0x1333701 0x10b92e9 0x1b5c600 0x7fc272ae8064 0x7fc27281d62d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F15B12","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F15432"},{"b":"400000","o":"17143C6","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"400000","o":"17143F3"},{"b":"400000","o":"EA5F7A","s":"_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv"},{"b":"400000","o":"EA64D8","s":"_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE"},{"b":"400000","o":"CD86AE","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESC_St11unique_lockISt5mutexE"},{"b":"400000","o":"CD8F8E","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESt11unique_lockISt5mutexE"},{"b":"400000","o":"CD96D6"},{"b":"400000","o":"CC100D","s":"_ZN5mongo8executor20NetworkInterfaceASIO18_completeOperationEPNS1_7AsyncOpERKNS_10StatusWithINS0_21RemoteCommandResponseEEE"},{"b":"400000","o":"CC1CCA","s":"_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE"},{"b":"400000","o":"CC22C8"},{"b":"400000","o":"CBF9B0"},{"b":"400000","o":"C93AA0","s":"_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"CA282C","s":"_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"CA2CE8","s":"_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m"},{"b":"400000","o":"F334E1","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"400000","o":"F33701","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"400000","o":"CB92E9"},{"b":"400000","o":"175C600","s":"execute_native_thread_routine"},{"b":"7FC272AE0000","o":"8064"},{"b":"7FC272735000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-4-amd64", "version" : "#1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07)", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A9B18E620F02487C1F2B5F355EB119AC75B94CD7" }, { "b" : "7FFF697E2000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "1F644FA66B1BCBBE1359CEF9A63CFD4C8F0C6011" }, { "b" : "7FC273A1C000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "21115992A1F885E1ACE88AADA60F126AD9759D03" }, { "b" : "7FC273620000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "32E9A5B9EED626E93DEEB00A49033F78652DB9A3" }, { "b" : "7FC273418000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "906B9D78305E46BC76994F552FA63751C51CD065" }, { "b" : "7FC273214000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "EDDA40FF0B16D74E776AEA74FAAE6B898ACD2D15" }, { "b" : "7FC272F13000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1B8F009691E3224A991F1F6517A74DA30A065B9A" }, { "b" : "7FC272CFD000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7FC272AE0000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "1ADC4ADBA1D853EEA9A5B3CD49E25AF85DCA0100" }, { "b" : "7FC272735000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "7A02D454BA0E8AF69E3A284C381318B55908DEDA" }, { "b" : "7FC273C7D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "060BF28EEE293312DDF82D4DBEF40B3BA8927F0A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1315b12]
 mongod(+0xF15432) [0x1315432]
 mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x1b143c6]
 mongod(+0x17143F3) [0x1b143f3]
 mongod(_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv+0xA0A) [0x12a5f7a]
 mongod(_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE+0x348) [0x12a64d8]
 mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESC_St11unique_lockISt5mutexE+0x2AE) [0x10d86ae]
 mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESt11unique_lockISt5mutexE+0x3E) [0x10d8f8e]
 mongod(+0xCD96D6) [0x10d96d6]
 mongod(_ZN5mongo8executor20NetworkInterfaceASIO18_completeOperationEPNS1_7AsyncOpERKNS_10StatusWithINS0_21RemoteCommandResponseEEE+0x34D) [0x10c100d]
 mongod(_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE+0x6A) [0x10c1cca]
 mongod(+0xCC22C8) [0x10c22c8]
 mongod(+0xCBF9B0) [0x10bf9b0]
 mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_+0x70) [0x1093aa0]
 mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_+0x89C) [0x10a282c]
 mongod(_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m+0x228) [0x10a2ce8]
 mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+0x2F1) [0x13334e1]
 mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xC1) [0x1333701]
 mongod(+0xCB92E9) [0x10b92e9]
 mongod(execute_native_thread_routine+0x20) [0x1b5c600]
 libpthread.so.0(+0x8064) [0x7fc272ae8064]
 libc.so.6(clone+0x6D) [0x7fc27281d62d]
-----  END BACKTRACE  -----

Effective limits:

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             64000                64000                processes
Max open files            64000                64000                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       1033430              1033430              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us



 Comments   
Comment by Mark Agarunov [ 27/Apr/17 ]

Hello devastor,

Thank you for the report. From the output and description you've provided, this appears to be expected behavior. Due to the way the WiredTiger engine works internally, it consumes a large amount of file descriptors. If you are regularly seeing this error, my recommendation would be to further increase the max open files limit in ulimits.

Thanks,
Mark

Generated at Thu Feb 08 04:19:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.