Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19673

Excessive memory allocated by WiredTiger journal

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.5
    • Fix Version/s: 3.0.6, 3.1.7
    • Component/s: Replication
    • Labels:
    • Environment:
      amazon linux 2015.03 on r3.2xl instances
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Linux
    • Backport Completed:

      Description

      We have a mongodb replica set consisting of 2 servers + 1 arbiter, all running mongodb 3.0.4, with the wired tiger storage engine.

      I updated the secondary to 3.0.5. It appeared to start syncing from the primary (the state transitioned to secondary and the logs said it was syncing from the primary) but crashed after about 40 seconds with this output in the logs:

      2015-07-30T19:56:36.491+0000 E STORAGE  [rsSync] WiredTiger (12) [1438286196:491627][2537:0x7f07a2c1b700], session.commit_transaction: memory allocation: Cannot allocate memory
      2015-07-30T19:56:36.526+0000 I -        [rsSync] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 305
      2015-07-30T19:56:36.542+0000 I CONTROL  [rsSync] 
       0xf77ca9 0xf16b61 0xefd29a 0xd9727c 0xd9738a 0x930c3c 0xc512b2 0xcbba40 0xcbcc68 0xcb53af 0xfc4e64 0x7f07b3f5cdf3 0x7f07b2a101bd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B77CA9"},{"b":"400000","o":"B16B61"},{"b":"400000","o":"AFD29A"},{"b":"400000","o":"99727C"},{"b":"400000","o":"99738A"},{"b":"400000","o":"530C3C"},{"b":"400000","o":"8512B2"},{"b":"400000","o":"8BBA40"},{"b":"400000","o":"8
      BCC68"},{"b":"400000","o":"8B53AF"},{"b":"400000","o":"BC4E64"},{"b":"7F07B3F55000","o":"7DF3"},{"b":"7F07B291A000","o":"F61BD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "
      Linux", "release" : "3.14.44-32.39.amzn1.x86_64", "version" : "#1 SMP Thu Jun 11 20:33:38 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E8A046532FE2C5A8474250BA8E5B72C6C84B9D68" }, { "b" : "7FFC6EC17000", "elfTy
      pe" : 3, "buildId" : "3C9325999495818299EC622C63D9330D234E02C9" }, { "b" : "7F07B3F55000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F07B3CE8000", "path" : "/usr/lib64/libssl.so.10",
       "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7F07B3903000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7F07B36FB000", "path" : "/lib64/librt.so.1"
      , "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F07B34F7000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F07B31F3000", "path" : "/usr/lib64/libstdc++.s
      o.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F07B2EF1000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F07B2CDB000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F07B291A000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "DF6DA145A649EA093507A635AF383F608E7CE3F2" }, { "b" : "7F07B4171000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F07B26D7000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DE5A9F7A11A0881CB64E375F4DDCA58028F0FAF8" }, { "b" : "7F07B23F2000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "A3E43FC66908AC8B00773707FECA3B1677AFF311" }, { "b" : "7F07B21EF000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7F07B1FC4000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "B10FBFEC246C4EAD1719D16090D0BE54904BBFC9" }, { "b" : "7F07B1DAE000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F07B1BA3000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "7292C0673D7C116E3389D3FFA67087A6B9287A71" }, { "b" : "7F07B19A0000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F07B1786000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F07B1565000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf77ca9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf16b61]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xefd29a]
       mongod(_ZN5mongo22WiredTigerRecoveryUnit9_txnCloseEb+0x2EC) [0xd9727c]
       mongod(_ZN5mongo22WiredTigerRecoveryUnit7_commitEv+0x8A) [0xd9738a]
       mongod(_ZN5mongo15WriteUnitOfWork6commitEv+0x1C) [0x930c3c]
       mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt5dequeINS_7BSONObjESaIS4_EE+0x282) [0xc512b2]
       mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERSt5dequeINS_7BSONObjESaIS5_EE+0x460) [0xcbba40]
       mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEv+0x118) [0xcbcc68]
       mongod(_ZN5mongo4repl13runSyncThreadEv+0x1EF) [0xcb53af]
       mongod(+0xBC4E64) [0xfc4e64]
       libpthread.so.0(+0x7DF3) [0x7f07b3f5cdf3]
       libc.so.6(clone+0x6D) [0x7f07b2a101bd]
      -----  END BACKTRACE  -----
      2015-07-30T19:56:36.542+0000 I -        [rsSync] 
       
      ***aborting after invariant() failure
      

      I restarted mongodb and the same thing happened. I downgraded to mongodb 3.0.4 and the secondary was able to rejoin the replica set and catch up as normal.

      The secondary wasn't under any read load at the time, but the primary was under a heavy write load

        Attachments

        1. collstats
          5 kB
        2. log
          23 kB
        3. mongod.conf
          0.3 kB

          Issue Links

            Activity

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                31 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: