Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17614

WT invariant crash with "hazard pointer table full" error

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: WiredTiger
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      sharded collection called "measurements" inside of "owamp" database. There is only a single mongod instance running with 3x config servers and a mongos all on the same machine.

      "measurements" contains 144 documents that look like

      { "_id" : ObjectId("5501d4ded73ff9282d792e81"), "identifier" : "a53ece24cea00ad7caf52bd43e48b59ff515ab49f8361d75767dc64f8a5f857f", "start" : NumberLong(1424217788), "end" : null, "source" : "BEAU_OW", "destination" : "SDPT_OW" }

      Issuing a query like the attached causes the shard to crash 100% of the time whether directly on the CLI or via a driver from a language.

      The query is generated programmatically which is why it's fairly long. This same query worked just fine prior to 3.0. It will also work again if I make it shorter by removing elements from the $or list.

      This also impacts the 3.0.1rc0 and the latest nightly as of today Mar 16 2015.

      Show
      sharded collection called "measurements" inside of "owamp" database. There is only a single mongod instance running with 3x config servers and a mongos all on the same machine. "measurements" contains 144 documents that look like { "_id" : ObjectId("5501d4ded73ff9282d792e81"), "identifier" : "a53ece24cea00ad7caf52bd43e48b59ff515ab49f8361d75767dc64f8a5f857f", "start" : NumberLong(1424217788), "end" : null, "source" : "BEAU_OW", "destination" : "SDPT_OW" } Issuing a query like the attached causes the shard to crash 100% of the time whether directly on the CLI or via a driver from a language. The query is generated programmatically which is why it's fairly long. This same query worked just fine prior to 3.0. It will also work again if I make it shorter by removing elements from the $or list. This also impacts the 3.0.1rc0 and the latest nightly as of today Mar 16 2015.

      Description

      I can manage to crash the mongod process reliably by issuing a particular query. The following shows up in the mongod error log when it happens:

      2015-03-16T18:50:25.693+0000 I SHARDING [conn23] remotely refreshing metadata for owamp_data.measurements with requested shard version 1|3||5501d3cc747828239734271b, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
      2015-03-16T18:50:25.694+0000 I SHARDING [conn23] collection owamp_data.measurements was previously unsharded, new metadata loaded with shard version 1|3||5501d3cc747828239734271b
      2015-03-16T18:50:25.694+0000 I SHARDING [conn23] collection version was loaded at version 1|3||5501d3cc747828239734271b, took 0ms
      2015-03-16T18:50:25.730+0000 E STORAGE  [conn23] WiredTiger (0) [1426531825:730461][33711:0x7fe49b724700], file:collection-53971--2806634737426111267.wt, cursor.search: session 0x2b35e80: hazard pointer table full
      2015-03-16T18:50:25.730+0000 I -        [conn23] Invariant failure: ret resulted in status UnknownError 12: Cannot allocate memory at src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 345
      2015-03-16T18:50:25.738+0000 I CONTROL  [conn23] 
       0xf09ea9 0xeacf81 0xe9238a 0xd2aa65 0x8f0ecc 0xa15c79 0x9dd760 0x9f4de8 0x9dd5c0 0x9ff005 0x9eeba2 0x9ef4ef 0xba0932 0xba0f9c 0xba12cf 0xb76aaf 0xb72686 0xa8e523 0x7fc40f 0xec0dab 0x31dec079d1 0x31de8e88fd
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B09EA9"},{"b":"400000","o":"AACF81"},{"b":"400000","o":"A9238A"},{"b":"400000","o":"92AA65"},{"b":"400000","o":"4F0ECC"},{"b":"400000","o":"615C79"},{"b":"400000","o":"5DD760"},{"b":"400000","o":"5F4DE8"},{"b":"400000","o":"5DD5C0"},{"b":"400000","o":"5FF005"},{"b":"
      400000","o":"5EEBA2"},{"b":"400000","o":"5EF4EF"},{"b":"400000","o":"7A0932"},{"b":"400000","o":"7A0F9C"},{"b":"400000","o":"7A12CF"},{"b":"400000","o":"776AAF"},{"b":"400000","o":"772686"},{"b":"400000","o":"68E523"},{"b":"400000","o":"3FC40F"},{"b":"400000","o":"AC0DAB"},{"b":"31DEC00000","o":"79D
      1"},{"b":"31DE800000","o":"E88FD"}],"processInfo":{ "mongodbVersion" : "3.1.0-pre-", "gitVersion" : "629eb083a2094b7a096b29d66504a8f34e1a1d60", "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.29.2.el6.x86_64", "version" : "#1 SMP Tue Sep 9 21:36:05 UTC 2014", "machine" : "x86_64" }, "somap"
       : [ { "elfType" : 2, "b" : "400000", "buildId" : "628470E698486E100E56509E55623D3CBF4625EE" }, { "b" : "7FFFCB8A8000", "elfType" : 3, "buildId" : "5474F0D8DAF3D6177E2C4B06F3892745CB43B4D5" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "334EE34819BB0C2AB8E5C1A7D4C5B0FE3D0C9C48"
       }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "9AFC7D1D9E75D7B740C82462DD002A81614C57E0" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C64
      01E5F9E221564D80665D277" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "ED99110E629209C5CA6C0ED704F2C5CE3171513A" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "7D8E9374F4A4EA38A7C1E763F32240EA113E4208" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buil
      dId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "E4EAB3C200B7D8444FF95AB01F6466924A6A5F5F" }, { "path" : "/lib64/ld-linux-x86-6
      4.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "50487A3480233636C29DBCAD5DE65421808948AB" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "D9A44621797C990C639FF2D5AA452AB559C277DE" 
      }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D180B6297A9A302693053BD753A85D04A88DE811" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774
      AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "FF9705F60A59F28CA0FC50720A4F18FA9A889BD6" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" :
       3, "buildId" : "F8B68F301C19BF06AF56B4B06E0A69F89D2C1F8D" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf09ea9]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeacf81]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xe9238a]
       mongod(_ZNK5mongo21WiredTigerRecordStore10findRecordEPNS_16OperationContextERKNS_8RecordIdEPNS_10RecordDataE+0x165) [0xd2aa65]
       mongod(_ZNK5mongo10Collection7findDocEPNS_16OperationContextERKNS_8RecordIdEPNS_11SnapshottedINS_7BSONObjEEE+0x3C) [0x8f0ecc]
       mongod(_ZN5mongo16WorkingSetCommon5fetchEPNS_16OperationContextEPNS_16WorkingSetMemberEPKNS_10CollectionE+0x79) [0xa15c79]
       mongod(_ZN5mongo10FetchStage4workEPm+0x290) [0x9dd760]
       mongod(_ZN5mongo7OrStage4workEPm+0xB8) [0x9f4de8]
       mongod(_ZN5mongo10FetchStage4workEPm+0xF0) [0x9dd5c0]
       mongod(_ZN5mongo16ShardFilterStage4workEPm+0x55) [0x9ff005]
       mongod(_ZN5mongo14MultiPlanStage12workAllPlansEmPNS_15PlanYieldPolicyE+0xE2) [0x9eeba2]
       mongod(_ZN5mongo14MultiPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0xEF) [0x9ef4ef]
       mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0x72) [0xba0932]
       mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionERKSsNS0_11YieldPolicyEPPS0_+0x7C) [0xba0f9c]
       mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextEPNS_10WorkingSetEPNS_9PlanStageEPNS_13QuerySolutionEPNS_14CanonicalQueryEPKNS_10CollectionENS0_11YieldPolicyEPPS0_+0x7F) [0xba12cf]
       mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionEPNS_14CanonicalQueryENS_12PlanExecutor11YieldPolicyEPPS6_m+0xCF) [0xb76aaf]
       mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x636) [0xb72686]
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xAF3) [0xa8e523]
       mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDF) [0x7fc40f]
       mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xec0dab]
       libpthread.so.0(+0x79D1) [0x31dec079d1]
       libc.so.6(clone+0x6D) [0x31de8e88fd]
      -----  END BACKTRACE  -----
      2015-03-16T18:50:25.738+0000 I -        [conn23] 
       
      ***aborting after invariant() failure
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: