Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33478

MongoDB doesn't shutdown cleanly on ENOMEM / malloc returning NULL

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Internal Code
    • Labels:
      None
    • ALL
    • Hide

      Running with overcommit_memory=2 to avoid OOM killer wreaking havok - plus memory pressure (can be transient).

      to induce this condition set overcommit_ratio to something like 20 and do things that make mongo use lots of memory (load data, create indexes, etc).

      Show
      Running with overcommit_memory=2 to avoid OOM killer wreaking havok - plus memory pressure (can be transient). to induce this condition set overcommit_ratio to something like 20 and do things that make mongo use lots of memory (load data, create indexes, etc).

      This is different than OOM killer which can't be controlled - when mongodb detects that the system cannot allocate memory, it should try to recover if possible rather than immediately die. it would also be preferable to shutdown cleanly if recovery is not possible.

      a pause in the server is very preferable to immediate process death - mongodb can afford to flush / free cache memory under this type of memory pressure.

      ENOMEM / NULL returned from malloc are not things that need to cause mongodb to blow itself up, especially as mongodb is most often the owner of most of the memory on a system.

      The benefits of recover and/or clean shutdown in general should be obvious. Gracefully handling system/resource limits could help avoid the potential for rollbacks and/or corrupted data.

      This would be very useful in the case of disabling overcommit on linux - which is very desirable in enterprise server environments as predictable behavior is highly preferable to allowing the OOM killer to kill things.

      Here's a backtrace from an unclean shutdown on out of memory from a replicated shard in a cluster.

      --------------- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"564C71955000","o":"15786B1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"564C71955000","o":"1577CE4","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"564C71955000","o":"14E5
      A81","s":"_ZN5mongo12mongoReallocEPvm"},{"b":"564C71955000","o":"889385","s":"_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi"},{"b":"564C71955000","o":"131FF75","s":"_ZN5mongo
      3rpc19CommandReplyBuilder22getInPlaceReplyBuilderEm"},{"b":"564C71955000","o":"A6F131","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"
      b":"564C71955000","o":"A70C61","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"564C71955000","o":"10895C0","s":"_ZN5mo
      ngo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"564C71955000","o":"C8EE98","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7Messa
      geERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"564C71955000","o":"88BFFD","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"564C71955000","o":"88
      C92D"},{"b":"564C71955000","o":"14E0401"},{"b":"7FF2C4B92000","o":"7E25"},{"b":"7FF2C47CF000","o":"F834D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.10", "gitVersion" : "078f28920cb24de0dd
      479b5ea6c66c644f6326e9", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-693.11.1.el7.x86_64", "version" : "#1 SMP Mon Dec 4 23:52:40 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "564C71955000", "elfType" : 3, "buildId" : "94C7FAB092E567C9338D13DB9B68751363D15EFD" }, { "b" : "7FFF4D0F9000", "elfType" : 3, "buildId" : "4D9C78C211890A0E48180A6194B1837FC9DECA70" }, { "b" : "7FF2C5B33000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "ED0AC7DEB91A242C194B3DEF27A215F41CE43116" }, { "b" : "7FF2C56D2000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "BC0AE9CA0705BEC1F0C0375AAD839843BB219CB1" }, { "b" : "7FF2C54CA000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "6D322588B36D2617C03C0F3B93677E62FCFFDA81" }, { "b" : "7FF2C52C6000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "1E42EBFB272D37B726F457D6FE3C33D2B094BB69" }, { "b" : "7FF2C4FC4000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId
      " : "808BD35686C193F218A5AAAC6194C49214CFF379" }, { "b" : "7FF2C4DAE000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3E85E6D20D2CE9CDAD535084BEA56620BAAD687C" }, { "b" : "7FF2C4B920
      00", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "A48D21B2578A8381FBD8857802EAA660504248DC" }, { "b" : "7FF2C47CF000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "95FF0
      2A4BEBABC573C7827A66D447F7BABDDAA44" }, { "b" : "7FF2C5DA5000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "22FA66DA7D14C88BF36C69454A357E5F1DEFAE4E" }, { "b" : "7FF2C4582000"
      , "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DA322D74F55A0C4293085371A8D0E94B5962F5E7" }, { "b" : "7FF2C429A000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "B
      69E63024D408E400401EEA6815317BDA38FB7C2" }, { "b" : "7FF2C4096000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "A3832734347DCA522438308C9F08F45524C65C9B" }, { "b" : "7FF2C3E63000",
       "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A48639BF901DB554479BFAD114CB354CF63D7D6E" }, { "b" : "7FF2C3C4D000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC
      8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7FF2C3A3F000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "6FDF5B013FD2739D304CFB9D723DCBC149EE03C9" }, { "b" : "7FF2C383B000", "p
      ath" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7FF2C3621000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FF4E72
      F4E574E143330FB3C66DB51613B0EC65EA" }, { "b" : "7FF2C33FA000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "A88379F56A51950A33198890D37F5F8AEE71F8B4" }, { "b" : "7FF2C3198000", "pat
      h" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x564c72ecd6b1]                                                                                                                                           mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x564c72eccce4]                                                                                                                              
       mongod(_ZN5mongo12mongoReallocEPvm+0x21) [0x564c72e3aa81]                                                                                                                                              mongod(_ZN5mongo11_BufBuilderINS_21SharedBufferAllocatorEE15grow_reallocateEi+0x55) [0x564c721de385]                                                                                                  
       mongod(_ZN5mongo3rpc19CommandReplyBuilder22getInPlaceReplyBuilderEm+0x35) [0x564c72c74f75]                                                                                                             mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0xB1) [0x564c723c4131]                                                                   mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF81) [0x564c723c5c61]                                                    
       mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x564c729de5c0]                                                                
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xD38) [0x564c725e3e98]                                                                   
       mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x564c721e0ffd]                                                                              
       mongod(+0x88C92D) [0x564c721e192d]                                                                                                                                                                     mongod(+0x14E0401) [0x564c72e35401]                                                                                                                                                                    libpthread.so.0(+0x7E25) [0x7ff2c4b99e25]                                                                                                                                                             
       libc.so.6(clone+0x6D) [0x7ff2c48c734d]                                                                                                                                                                
      -----  END BACKTRACE  -----
      

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            underrun Derek Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: