Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41085

Mongo swallowed all the server memory until crashing

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Waiting For User Input
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 3.4.16
    • Fix Version/s: None
    • Labels:
      None
    • Operating System:
      ALL

      Description

      We encountered a mongo memory burst this night until the mongo instance just crashed. The hosting server is running with 256 GB, by default WT cache should take 127GB max (50% - 1GB) but took more than 230GB before the crash  (06:20:07 AM).

      Here is the last line of log including the crash stack trace:

      2019-05-10T06:20:07.453+0200 I - [conn3747849] Invariant failure _scope->exec( "$arr = [];", "group reduce init 2", false, true, false , 2 * 1000) src/mongo/db/exec/group.cpp 113
      2019-05-10T06:20:07.453+0200 I - [conn3747849] 
       
      ***aborting after invariant() failure
       
       
      2019-05-10T06:20:07.463+0200 I COMMAND [conn3617951] command engagement_console_production.mtn_gha#tasks command: find { find: "mtn_gha#tasks", filter: { queue: "global", $where: CodeWScope( this.sla_expires_at
      && (!this.last_prioritized_at || this.last_prioritized_at <= this.sla_expires_at), {}) }, sort: { priority: -1, created_at: 1 } } planSummary: IXSCAN { queue: 1, priority: -1, created_at: 1 } keysExamined:0 doc
      sExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:131 locks:{ Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } } } protocol:op_query 1771m
      s
      2019-05-10T06:20:07.470+0200 I NETWORK [conn3770955] received client metadata from 10.10.11.17:27959 conn3770955: { driver: { name: "mongo-ruby-driver", version: "2.4.3" }, os: { type: "linux", name: "linux-gnu
      ", architecture: "x86_64" }, platform: "mongoid-5.2.1, 2.5.3, x86_64-linux, x86_64-pc-linux-gnu" }
      2019-05-10T06:20:07.473+0200 I NETWORK [conn3770957] received client metadata from 10.10.11.17:27962 conn3770957: { driver: { name: "mongo-ruby-driver", version: "2.4.3" }, os: { type: "linux", name: "linux-gnu
      ", architecture: "x86_64" }, platform: "mongoid-5.2.1, 2.5.3, x86_64-linux, x86_64-pc-linux-gnu" }
      2019-05-10T06:20:07.555+0200 F - [conn3747849] Got signal: 6 (Aborted).
       
      0x55d929283171 0x55d929282389 0x55d92928286d 0x7fe6bee870e0 0x7fe6beb09fff 0x7fe6beb0b42a 0x55d9285241be 0x55d928899808 0x55d92889b0da 0x55d9288af5c3 0x55d928bb896a 0x55d928bb928b 0x55d9287b5bb1 0x55d9287729af 
      0x55d9287740aa 0x55d928d8f480 0x55d928990f52 0x55d928992f56 0x55d92859497d 0x55d9285952ad 0x55d9291e90d1 0x7fe6bee7d4a4 0x7fe6bebbfd0f
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"55D927D08000","o":"157B171","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55D927D08000","o":"157A389"},{"b":"55D927D08000","o":"157A86D"},{"b":"7FE6BEE76000","o":"110E0"},{"b":"7FE6BEAD7000","o
      ":"32FFF","s":"gsignal"},{"b":"7FE6BEAD7000","o":"3442A","s":"abort"},{"b":"55D927D08000","o":"81C1BE","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"55D927D08000","o":"B91808","s":"_ZN5mongo10Gr
      oupStage18initGroupScriptingEv"},{"b":"55D927D08000","o":"B930DA","s":"_ZN5mongo10GroupStage6doWorkEPm"},{"b":"55D927D08000","o":"BA75C3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"55D927D08000","o":"EB096A","s":"
      _ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE"},{"b":"55D927D08000","o":"EB128B","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"55D927D08000","o":"
      AADBB1"},{"b":"55D927D08000","o":"A6A9AF","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"55D927D08000","o":"A6C0AA","s":"_ZN5mongo7Command11ex
      ecCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"55D927D08000","o":"1087480","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"55D927D08000","o":"C88F52"},{"b":"55D927D08000","o":"C8AF56","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"55D927D08000","o":"88C97D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"55D927D08000","o":"88D2AD"},{"b":"55D927D08000","o":"14E10D1"},{"b":"7FE6BEE76000","o":"74A4"},{"b":"7FE6BEAD7000","o":"E8D0F","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.16", "gitVersion" : "0d6a9242c11b99ddadcfb6e86a850b6ba487530a", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-8-amd64", "version" : "#1 SMP Debian 4.9.144-3 (2019-02-02)", "machine" : "x86_64" }, "somap" : [ { "b" : "55D927D08000", "elfType" : 3, "buildId" : "36452F27FE7A41D0E57DDE38A17B3FAE9980B0BE" }, { "b" : "7FFCE2DCE000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "BEC7DDC38F3120E790C01B2A8A67C441C9B915CF" }, { "b" : "7FE6BFDB6000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "90275AC4DD8167F60BC7C599E0DBD63741D8F191" }, { "b" : "7FE6BF9BA000", "path" : "/opt/dell/srvadmin/lib64/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "FD6376149047833953B0269E84DE181CA45DBE90" }, { "b" : "7FE6BF7B2000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "86B35D63FACD97D22973E99EE9863F7714C4F53A" }, { "b" : "7FE6BF5AE000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DB2CAEEEC37482A98AB1416D0A9AFE2944930DE9" }, { "b" : "7FE6BF2AA000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E49714C557CE0472C798F39365CA10F9C0E1933" }, { "b" : "7FE6BF093000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FE6BEE76000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "16D609487BCC4ACBAC29A4EAA2DDA0D2F56211EC" }, { "b" : "7FE6BEAD7000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "775143E680FF0CD4CD51CCE1CE8CA216E635A1D6" }, { "b" : "7FE6C0017000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "606DF9C355103E82140D513BC7A25A635591C153" } ] }}
      mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55d929283171]
      mongod(+0x157A389) [0x55d929282389]
      mongod(+0x157A86D) [0x55d92928286d]
      libpthread.so.0(+0x110E0) [0x7fe6bee870e0]
      libc.so.6(gsignal+0xCF) [0x7fe6beb09fff]
      mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x55d9285241be]
      mongod(_ZN5mongo10GroupStage18initGroupScriptingEv+0x418) [0x55d928899808]
      mongod(_ZN5mongo10GroupStage6doWorkEPm+0x14A) [0x55d92889b0da]
      mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x55d9288af5c3]
      mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x19A) [0x55d928bb896a]
      mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4B) [0x55d928bb928b]
      mongod(+0xAADBB1) [0x55d9287b5bb1]
      mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0x55d9287729af]
      mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF6A) [0x55d9287740aa]
      mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0x55d928d8f480]
      mongod(+0xC88F52) [0x55d928990f52]
      mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x746) [0x55d928992f56]
      mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0x55d92859497d]
      mongod(+0x88D2AD) [0x55d9285952ad]
      mongod(+0x14E10D1) [0x55d9291e90d1]
      libpthread.so.0(+0x74A4) [0x7fe6bee7d4a4]
      libc.so.6(clone+0x3F) [0x7fe6bebbfd0f]
      ----- END BACKTRACE -----
      

      Last 5 days memory usage:

      Memory usage some hours before the crash:

      Last 5 days operation counter:

      Operations counter before the crash:

      I've prepared an archive with the last 5 days of diagnostic data, can you give me a portal link ?

      Mongo version: 3.4.16
      There is some similarity with this ticket: SERVER-37795

        Attachments

        1. mongo_memory_usage.png
          mongo_memory_usage.png
          24 kB
        2. mongo_operations_counter.png
          mongo_operations_counter.png
          39 kB
        3. Screen Shot 2019-05-20 at 4.49.10 PM.png
          Screen Shot 2019-05-20 at 4.49.10 PM.png
          153 kB
        4. Screen Shot 2019-06-05 at 10.31.37 AM.png
          Screen Shot 2019-06-05 at 10.31.37 AM.png
          32 kB
        5. Screenshot from 2019-05-10 18-07-45.png
          Screenshot from 2019-05-10 18-07-45.png
          22 kB
        6. Screenshot from 2019-05-10 18-10-08.png
          Screenshot from 2019-05-10 18-10-08.png
          38 kB

          Issue Links

            Activity

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: