Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35666

mongod process consumes all memory and then exits or is being killed by oom-killer

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.5
    • Component/s: Stability
    • Labels:
    • Environment:
      Google Cloud Instance n1-highmem-4 (4 vCPUs, 26 GB memory) with Debian 9.4.
    • ALL
      1. Initiate chunk moving
      2. Wait for several hours and mongod will either exit with out-of-memory error or be killed by oom-killer

      A two shards MongoDB database regularly crashes with out-of-memory error or is being killed by the oom-killer. The system runs on GCE Debian 9.4 with MongoDB v3.6.5WiredTiger storage engine. The servers are n1-highmem-4 (4 vCPUs26 GB memory). On the server runs just mongod and there are no other services. mongos are on different servers.

      Initially the servers were without swap as it is the practice on Google Cloud, but afterwards a small 1GB swap has been added, but the problem is the same.

      We have tried to play with the cacheSizeGB parameter and reduced it to 10GB but still the crash happens.

      Usually process exit/crash happens al least once a day.

      Currently on the database there is a chunk moving process underway and crashes happens only on the shard instances that receive chunks. Sending chunks instances never crashed.

      If mongod process is killed by oom-killer this can be seen in the syslogs:

      Jun 15 14:45:17 server4 kernel: [1731430.432189] Out of memory: Kill process 13130 (mongod) score 980 or sacrifice child
      Jun 15 14:45:17 server4 kernel: [1731430.441717] Killed process 13130 (mongod) total-vm:28280536kB, anon-rss:26174876kB, file-rss:0kB, shmem-rss:0kB

      {{}}Sometimes mongod exits with leaving this in the mongod.log:

      2018-06-15T02:14:32.456+0200 F - [rsSync] out of memory.0x55cbc8535751 0x55cbc8534d84 0x55cbc8623b4b 0x55cbc86c665c 0x55cbc70fccff 0x55cbc70f8b02 0x55cbc707b3f1 0x55cbc86449b0 0x7fbbf3507494 0x7fbbf3249acf
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"55CBC6305000","o":"2230751","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55CBC6305000","o":"222FD84","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55CBC6305000","o":"231EB4B"},{"b":"55CBC6305000","o":"23C165C","s":"_Znam"},{"b":"55CBC6305000","o":"DF7CFF","s":"_ZN5mongo4repl8SyncTail7OpQueueC1Ev"},{"b":"55CBC6305000","o":"DF3B02","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE"},{"b":"55CBC6305000","o":"D763F1","s":"_ZN5mongo4repl10RSDataSync4_runEv"},{"b":"55CBC6305000","o":"233F9B0"},{"b":"7FBBF3500000","o":"7494"},{"b":"7FBBF3161000","o":"E8ACF","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.5", "gitVersion" : "a20ecd3e3a174162052ff99913bc2ca9a839d618", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-6-amd64", "version" : "#1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)", "machine" : "x86_64" }, "somap" : [ { "b" : "55CBC6305000", "elfType" : 3, "buildId" : "7D4592BDFAA6C15459D2319DEAB7F10E9EB4E7D7" }, { "b" : "7FFC48D98000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "A3207CC9FE1CAA3374AE7061AA5C3C5619B8A0E5" }, { "b" : "7FBBF4743000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "713D47D5F599289C0A91ADE8F0122B2B4AA78B2E" }, { "b" : "7FBBF42B0000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "2CFE882A331D7857E9CE1B5DE3255E6DA76EF899" }, { "b" : "7FBBF4044000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "E2AA3B39763D943F56B3BD05C8E36E639BA95E12" }, { "b" : "7FBBF3E40000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "B895F0831F623C5F23603401D4069F9F94C24761" }, { "b" : "7FBBF3C38000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "5D83E0642E645026DBB11F89F7DF7106BD821495" }, { "b" : "7FBBF3934000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1B95E3A8B8788B07E4F59EE69B1877F9DEB42033" }, { "b" : "7FBBF371D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FBBF3500000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "4285CD3158DDE596765C747AE210AB6CBD258B22" }, { "b" : "7FBBF3161000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "AA889E26A70F98FA8D230D088F7CC5BF43573163" }, { "b" : "7FBBF495A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "263F909DBE11A66F7C6233E3FF0521148D9F8370" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55cbc8535751]
       mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x55cbc8534d84]
       mongod(+0x231EB4B) [0x55cbc8623b4b]
       mongod(_Znam+0x21C) [0x55cbc86c665c]
       mongod(_ZN5mongo4repl8SyncTail7OpQueueC1Ev+0x7F) [0x55cbc70fccff]
       mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE+0x402) [0x55cbc70f8b02]
       mongod(_ZN5mongo4repl10RSDataSync4_runEv+0x111) [0x55cbc707b3f1]
       mongod(+0x233F9B0) [0x55cbc86449b0]
       libpthread.so.0(+0x7494) [0x7fbbf3507494]
       libc.so.6(clone+0x3F) [0x7fbbf3249acf]
      ----- END BACKTRACE -----

      I would suspect memory management is not working properly while receiving chunks.

        1. mem_without_wt_cache.png
          mem_without_wt_cache.png
          9 kB
        2. memstack1.png
          memstack1.png
          186 kB
        3. memstack2.png
          memstack2.png
          186 kB
        4. pinned_cursors.png
          pinned_cursors.png
          11 kB
        5. stacks.html
          338 kB
        6. suspects.png
          suspects.png
          204 kB

            Assignee:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Reporter:
            sasa.skevin@gmail.com Sasa Skevin
            Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: