Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19673

Excessive memory allocated by WiredTiger journal

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.5
    • Fix Version/s: 3.0.6, 3.1.7
    • Component/s: Replication
    • Labels:
    • Environment:
      amazon linux 2015.03 on r3.2xl instances
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Backport Completed:

      Description

      We have a mongodb replica set consisting of 2 servers + 1 arbiter, all running mongodb 3.0.4, with the wired tiger storage engine.

      I updated the secondary to 3.0.5. It appeared to start syncing from the primary (the state transitioned to secondary and the logs said it was syncing from the primary) but crashed after about 40 seconds with this output in the logs:

      2015-07-30T19:56:36.491+0000 E STORAGE  [rsSync] WiredTiger (12) [1438286196:491627][2537:0x7f07a2c1b700], session.commit_transaction: memory allocation: Cannot allocate memory
      2015-07-30T19:56:36.526+0000 I -        [rsSync] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 305
      2015-07-30T19:56:36.542+0000 I CONTROL  [rsSync] 
       0xf77ca9 0xf16b61 0xefd29a 0xd9727c 0xd9738a 0x930c3c 0xc512b2 0xcbba40 0xcbcc68 0xcb53af 0xfc4e64 0x7f07b3f5cdf3 0x7f07b2a101bd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B77CA9"},{"b":"400000","o":"B16B61"},{"b":"400000","o":"AFD29A"},{"b":"400000","o":"99727C"},{"b":"400000","o":"99738A"},{"b":"400000","o":"530C3C"},{"b":"400000","o":"8512B2"},{"b":"400000","o":"8BBA40"},{"b":"400000","o":"8
      BCC68"},{"b":"400000","o":"8B53AF"},{"b":"400000","o":"BC4E64"},{"b":"7F07B3F55000","o":"7DF3"},{"b":"7F07B291A000","o":"F61BD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "
      Linux", "release" : "3.14.44-32.39.amzn1.x86_64", "version" : "#1 SMP Thu Jun 11 20:33:38 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E8A046532FE2C5A8474250BA8E5B72C6C84B9D68" }, { "b" : "7FFC6EC17000", "elfTy
      pe" : 3, "buildId" : "3C9325999495818299EC622C63D9330D234E02C9" }, { "b" : "7F07B3F55000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F07B3CE8000", "path" : "/usr/lib64/libssl.so.10",
       "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7F07B3903000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7F07B36FB000", "path" : "/lib64/librt.so.1"
      , "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F07B34F7000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F07B31F3000", "path" : "/usr/lib64/libstdc++.s
      o.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F07B2EF1000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F07B2CDB000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F07B291A000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DF6DA145A649EA093507A635AF383F608E7CE3F2" }, { "b" : "7F07B4171000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F07B26D7000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DE5A9F7A11A0881CB64E375F4DDCA58028F0FAF8" }, { "b" : "7F07B23F2000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "A3E43FC66908AC8B00773707FECA3B1677AFF311" }, { "b" : "7F07B21EF000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F07B1FC4000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7F07B1DAE000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F07B1BA3000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7F07B19A0000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F07B1786000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F07B1565000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77ca9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf16b61]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xefd29a]
       mongod(_ZN5mongo22WiredTigerRecoveryUnit9_txnCloseEb+0x2EC) [0xd9727c]
       mongod(_ZN5mongo22WiredTigerRecoveryUnit7_commitEv+0x8A) [0xd9738a]
       mongod(_ZN5mongo15WriteUnitOfWork6commitEv+0x1C) [0x930c3c]
       mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt5dequeINS_7BSONObjESaIS4_EE+0x282) [0xc512b2]
       mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERSt5dequeINS_7BSONObjESaIS5_EE+0x460) [0xcbba40]
       mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x118) [0xcbcc68]
       mongod(_ZN5mongo4repl13runSyncThreadEv+0x1EF) [0xcb53af]
       mongod(+0xBC4E64) [0xfc4e64]
       libpthread.so.0(+0x7DF3) [0x7f07b3f5cdf3]
       libc.so.6(clone+0x6D) [0x7f07b2a101bd]
      -----  END BACKTRACE  -----
      2015-07-30T19:56:36.542+0000 I -        [rsSync] 
       
      ***aborting after invariant() failure
      

      I restarted mongodb and the same thing happened. I downgraded to mongodb 3.0.4 and the secondary was able to rejoin the replica set and catch up as normal.

      The secondary wasn't under any read load at the time, but the primary was under a heavy write load

      1. collstats
        5 kB
        Frederick Cheung
      2. log
        23 kB
        Frederick Cheung
      3. mongod.conf
        0.3 kB
        Frederick Cheung

        Issue Links

          Activity

          Hide
          pasette Dan Pasette added a comment -

          Sue LoVerso has identified the fix for this issue as WT-2007. This issue was exacerbated by increasing the number of log slots to 128 in 3.0.5 (as part of WT-1989), but was present in earlier versions as well.

          Show
          pasette Dan Pasette added a comment - Sue LoVerso has identified the fix for this issue as WT-2007 . This issue was exacerbated by increasing the number of log slots to 128 in 3.0.5 (as part of WT-1989 ), but was present in earlier versions as well.
          Hide
          michael.cahill Michael Cahill added a comment -

          The backport of WT-2007 has been completed, testing the current (small) set of WT updates for 3.0.6 here:

          https://evergreen.mongodb.com/version/55bf0d8a3ff12268ad000030_0

          Show
          michael.cahill Michael Cahill added a comment - The backport of WT-2007 has been completed, testing the current (small) set of WT updates for 3.0.6 here: https://evergreen.mongodb.com/version/55bf0d8a3ff12268ad000030_0
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          The fix for WT-2007 made it into wiredtiger's mongodb-3.0 branch:

          commit fad2eacb13ebeb0dfd3a001ca45d82eff83a9af3
          Author: sueloverso <sue@mongodb.com>
          Date:   Wed Jul 15 10:20:59 2015 -0400
           
              WT-2007 Statically allocate log slot buffers to a maximum size.
              
              Merge pull request #2062 from wiredtiger/log-buf-static-max
              
              (cherry picked from commit e1d8bc7fa41efe1d4765cca2b9857907802fcf1b)
          

          This was merged into mongo's master and v3.0 branches, so I'm resolving this ticket.

          Show
          ramon.fernandez Ramon Fernandez added a comment - The fix for WT-2007 made it into wiredtiger's mongodb-3.0 branch: commit fad2eacb13ebeb0dfd3a001ca45d82eff83a9af3 Author: sueloverso <sue@mongodb.com> Date: Wed Jul 15 10:20:59 2015 -0400   WT-2007 Statically allocate log slot buffers to a maximum size. Merge pull request #2062 from wiredtiger/log-buf-static-max (cherry picked from commit e1d8bc7fa41efe1d4765cca2b9857907802fcf1b) This was merged into mongo's master and v3.0 branches, so I'm resolving this ticket.
          Hide
          bruce.lucas Bruce Lucas added a comment -

          The key statistic for diagnosing this issue is the following:

          db.serverStatus().wiredTiger.log['total log buffer size']
          

          This is memory used by mongod for the WT journal, and is not limited by the WT cache setting. If it is large (e.g. GBs) OOM may result. The fix for this issue limits the amount of memory used by the WT journal to (I believe) 32 MB.

          Show
          bruce.lucas Bruce Lucas added a comment - The key statistic for diagnosing this issue is the following: db.serverStatus().wiredTiger.log['total log buffer size'] This is memory used by mongod for the WT journal, and is not limited by the WT cache setting. If it is large (e.g. GBs) OOM may result. The fix for this issue limits the amount of memory used by the WT journal to (I believe) 32 MB.
          Hide
          fcheung Frederick Cheung added a comment -

          Ah, even on our 3.0.4 instance that metric was 13G on our secondary.

          Show
          fcheung Frederick Cheung added a comment - Ah, even on our 3.0.4 instance that metric was 13G on our secondary.

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: