Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33392

Server terminates when unable to create worker thread

    • Type: Icon: Improvement Improvement
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.0
    • Component/s: Stability
    • Labels:
      None
    • Service Arch 2018-11-05, Service Arch 2018-11-19, Service Arch 2018-12-03, Service Arch 2018-12-17, Service Arch 2018-12-31, Service Arch 2019-01-14

      When the server is unable to create a thread, the outcome depends on the context:

      • Thread for incoming connection: server logs the failure, then closes the connection and continues
        2018-02-16T10:39:32.289-0800 I NETWORK  [initandlisten] connection accepted from 10.x.x.x:xxxxx #94318774 (31890 connections now open)
        2018-02-16T10:39:32.289-0800 I NETWORK  [initandlisten] pthread_create failed: errno:11 Resource temporarily unavailable
        2018-02-16T10:39:32.291-0800 I NETWORK  [initandlisten] failed to create thread after accepting new connection, closing connection
        
      • Worker thread: server logs the failure, then terminates
        2018-02-16T10:40:17.302-0800 F -        [NetworkInterfaceASIO-BGSync-0] std::exception::what(): Resource temporarily unavailable
        Actual exception type: std::system_error
        
         0x1351ff2 0x1351b42 0x1b37646 0x1b37673 0x12df774 0x12dfcc8 0x112011e 0x11209fe 0x112114c 0x111423f 0x11096bd 0x110a2da 0x110a8d8 0x1108070 0x10db4e0 0x10e99bc 0x10e9e78 0x136e5f1 0x136e811 0x1101d3f 0x1b7f610 0x7f7e80f8d184 0x7f7e80cba03d
        ----- BEGIN BACKTRACE -----
        {"backtrace":[{"b":"400000","o":"F51FF2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F51B42"},{"b":"400000","o":"1737646","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"400000","o":"1737673"},{"b":"400000","o":"EDF774","s":"_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv"},{"b":"400000","o":"EDFCC8","s":"_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE"},{"b":"400000","o":"D2011E","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESC_St11unique_lockISt5mutexE"},{"b":"400000","o":"D209FE","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESt11unique_lockISt5mutexE"},{"b":"400000","o":"D2114C"},{"b":"400000","o":"D1423F","s":"_ZN5mongo8executor20NetworkInterfaceASIO7AsyncOp6finishERKNS_10StatusWithINS0_21RemoteCommandResponseEEE"},{"b":"400000","o":"D096BD","s":"_ZN5mongo8executor20NetworkInterfaceASIO18_completeOperationEPNS1_7AsyncOpERKNS_10StatusWithINS0_21RemoteCommandResponseEEE"},{"b":"400000","o":"D0A2DA","s":"_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE"},{"b":"400000","o":"D0A8D8"},{"b":"400000","o":"D08070"},{"b":"400000","o":"CDB4E0","s":"_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"CE99BC","s":"_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_"},{"b":"400000","o":"CE9E78","s":"_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m"},{"b":"400000","o":"F6E5F1","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"400000","o":"F6E811","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"400000","o":"D01D3F"},{"b":"400000","o":"177F610","s":"execute_native_thread_routine"},{"b":"7F7E80F85000","o":"8184"},{"b":"7F7E80BBC000","o":"FE03D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.16", "gitVersion" : "056bf45128114e44c5358c7a8776fb582363e094", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-79-generic", "version" : "#100~14.04.1-Ubuntu SMP Fri May 19 18:36:51 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "08B7F4039582A49C92A2B9D92929EF6B690B4F4A" }, { "b" : "7FFFEF77E000", "elfType" : 3, "buildId" : "3449FF93C74CB63856A9BE01B606A0BB1DE26BE3" }, { "b" : "7F7E81EA7000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "1287BAA0C3440FDF4F9A5AB267445129A9DBD14E" }, { "b" : "7F7E81ACB000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "3F882E7949FA0CB52422985A88CDD7E6182CBD70" }, { "b" : "7F7E818C3000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "4F930712D3609C93E380E5BE5DF73E7AD273531C" }, { "b" : "7F7E816BF000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "034D6A4EE9DCAB4A34ABD644345CBBB42DC63088" }, { "b" : "7F7E813B9000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "300C7884CDEB5667BEA2357D2B8E7A76397562D6" }, { "b" : "7F7E811A3000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F7E80F85000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "F64B8AD471FBA1B7A3A64EFB01551E694975E1F7" }, { "b" : "7F7E80BBC000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "D9A10B8EF90300628DD0A3A535106967714D7328" }, { "b" : "7F7E82106000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2CA513EDC89C7BC06EC183D1A3A03CC0F606319C" } ] }}
         mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1351ff2]
         mongod(+0xF51B42) [0x1351b42]
         mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x1b37646]
         mongod(+0x1737673) [0x1b37673]
         mongod(_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv+0xA34) [0x12df774]
         mongod(_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE+0x348) [0x12dfcc8]
         mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESC_St11unique_lockISt5mutexE+0x2AE) [0x112011e]
         mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor23scheduleIntoPool_inlockEPSt4listISt10shared_ptrINS1_13CallbackStateEESaIS5_EERKSt14_List_iteratorIS5_ESt11unique_lockISt5mutexE+0x3E) [0x11209fe]
         mongod(+0xD2114C) [0x112114c]
         mongod(_ZN5mongo8executor20NetworkInterfaceASIO7AsyncOp6finishERKNS_10StatusWithINS0_21RemoteCommandResponseEEE+0x14F) [0x111423f]
         mongod(_ZN5mongo8executor20NetworkInterfaceASIO18_completeOperationEPNS1_7AsyncOpERKNS_10StatusWithINS0_21RemoteCommandResponseEEE+0x35D) [0x11096bd]
         mongod(_ZN5mongo8executor20NetworkInterfaceASIO20_completedOpCallbackEPNS1_7AsyncOpE+0x6A) [0x110a2da]
         mongod(+0xD0A8D8) [0x110a8d8]
         mongod(+0xD08070) [0x1108070]
         mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder2IRSt8functionIFvSt10error_codemEES5_mEEEEvRPNS1_11strand_implERT_+0x70) [0x10db4e0]
         mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder2INS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS8_EEEENS_17mutable_buffers_1ENS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEESI_mEESK_EEEEvRPNS1_11strand_implERT_+0x89C) [0x10e99bc]
         mongod(_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS6_EEEES2_NS0_14transfer_all_tENS0_15wrapped_handlerINS_10io_service6strandESt8functionIFvSt10error_codemEENS0_26is_continuation_if_runningEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSF_m+0x228) [0x10e9e78]
         mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+0x2F1) [0x136e5f1]
         mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xC1) [0x136e811]
         mongod(+0xD01D3F) [0x1101d3f]
         mongod(execute_native_thread_routine+0x20) [0x1b7f610]
         libpthread.so.0(+0x8184) [0x7f7e80f8d184]
         libc.so.6(clone+0x6D) [0x7f7e80cba03d]
        -----  END BACKTRACE  -----
        

      Failure to create a thread is often the result of a temporary failure, ie. EAGAIN "Resource temporarily unavailable". In this case, terminating the server is an overly drastic response. It would be much better if the server could handle this situation more gracefully, eg. fail the operation that caused the worker thread creation to be attempted (perhaps with a message informing the requesting application/user of the temporary failure and advising them to try again).

      If the "operation which caused the worker thread to be created" is server startup, then it would be alright to terminate the server (since "server startup" has failed). Any worker threads which are created after server startup, but are absolutely essential could also terminate the server — but presumably this wouldn't be all of them (eg. threads for ASIO egress).

            Assignee:
            adam.martin@mongodb.com ADAM Martin (Inactive)
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: