Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16763

mongod terminate due to mongo::DBTryLockTimeoutException during longevity test with wiredTiger

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.0-rc6
    • Affects Version/s: 2.8.0-rc3, 2.8.0-rc4
    • Component/s: Concurrency
    • Labels:
    • Fully Compatible
    • ALL

      mongod crashed during mixed read/write traffic testing, the thread raise the exception is shard related, which does moveChunk.

      This happens after about 3 days of execution, reproduced the same issue with rc3 & rc4.

      here is log about the crash (from rc4)

      2015-01-05T22:54:49.372+0000 F -        [conn60] terminate() called. An exception is active; attempting to gather more information
      2015-01-05T22:54:49.441+0000 F -        [conn60] std::exception::what(): std::exception
      Actual exception type: mongo::DBTryLockTimeoutException
      
       0xf133b9 0xf12eb0 0x7fb3ac8bb6c6 0x7fb3ac8ba789 0x7fb3ac8bb33a 0x7fb3ac358913 0x7fb3ac358e47 0x9954a4 0xdac52c 0xdae3f0 0x9ad054 0x9adf93 0x9aea4b 0xb7ca1a 0xa8fcd5 0x7e41f0 0xed1381 0x7fb3acf74f18 0x7fb3ac086b9d
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B133B9"},{"b":"400000","o":"B12EB0"},{"b":"7FB3AC85D000","o":"5E6C6"},{"b":"7FB3AC85D000","o":"5D789"},{"b":"7FB3AC85D000","o":"5E33A"},{"b":"7FB3AC349000","o":"F913"},{"b":"7FB3AC349000","o":"FE47"},{"b":"400000","o":"5954A4"},{"b":"400000","o":"9AC52C"},{"b":"400000","o":"9AE3F0"},{"b":"400000","o":"5AD054"},{"b":"400000","o":"5ADF93"},{"b":"400000","o":"5AEA4B"},{"b":"400000","o":"77CA1A"},{"b":"400000","o":"68FCD5"},{"b":"400000","o":"3E41F0"},{"b":"400000","o":"AD1381"},{"b":"7FB3ACF6D000","o":"7F18"},{"b":"7FB3ABFA4000","o":"E2B9D"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4", "gitVersion" : "3ad571742911f04b307f0071979425511c4f2570", "uname" : { "sysname" : "Linux", "release" : "3.14.19-17.43.amzn1.x86_64", "version" : "#1 SMP Wed Sep 17 22:14:52 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFA4AFE000", "elfType" : 3 }, { "b" : "7FB3ACF6D000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7FB3ACD65000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7FB3ACB61000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7FB3AC85D000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7FB3AC55F000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7FB3AC349000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7FB3ABFA4000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7FB3AD189000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf133b9]
       mongod(+0xB12EB0) [0xf12eb0]
       libstdc++.so.6(+0x5E6C6) [0x7fb3ac8bb6c6]
       libstdc++.so.6(+0x5D789) [0x7fb3ac8ba789]
       libstdc++.so.6(__gxx_personality_v0+0x52A) [0x7fb3ac8bb33a]
       libgcc_s.so.1(+0xF913) [0x7fb3ac358913]
       libgcc_s.so.1(_Unwind_Resume+0x57) [0x7fb3ac358e47]
       mongod(_ZN5mongo4Lock10GlobalReadC2EPNS_6LockerEj+0x84) [0x9954a4]
       mongod(_ZN5mongo17MigrateFromStatus4doneEPNS_16OperationContextE+0x8C) [0xdac52c]
       mongod(_ZN5mongo16MoveChunkCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1CE0) [0xdae3f0]
       mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9ad054]
       mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9adf93]
       mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9aea4b]
       mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x76A) [0xb7ca1a]
       mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB25) [0xa8fcd5]
       mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e41f0]
       mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x411) [0xed1381]
       libpthread.so.0(+0x7F18) [0x7fb3acf74f18]
       libc.so.6(clone+0x6D) [0x7fb3ac086b9d]
      -----  END BACKTRACE  -----
      

      few more events related to conn60 before the crash

      2015-01-05T17:47:22.452+0000 I SHARDING [conn60] moveChunk data transfer progress: { active: true, ns: "sbtest.sbtest1", from: "rs2/172.31.32.214:27017,ip-172-31-35-229:27017", min: { _id: -7816322693657637576 }, max: { _id: -7672769179660119751 }, shardKeyPattern: { _id: "hashed" }, state: "clone", counts: { cloned: 1480, clonedBytes: 321160, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 6
      
      2015-01-05T17:47:22.602+0000 I SHARDING [conn60] About to check if it is safe to enter critical section
      2015-01-05T17:47:22.602+0000 E SHARDING [conn60] moveChunk cannot enter critical section before all data is cloned, 81584 locs were not transferred but to-shard reported { active: true, ns: "sbtest.sbtest1", from: "rs2/172.31.32.214:27017,ip-172-31-35-229:27017", min: { _id: -7816322693657637576 }, max: { _id: -7672769179660119751 }, shardKeyPattern: { _id: "hashed" }, state: "clone", counts: { cloned: 1480, clonedBytes: 321160, catchup: 0, steady: 0 }, ok: 1.0 }
      2015-01-05T17:47:22.602+0000 I SHARDING [conn60] MigrateFromStatus::done About to acquire global lock to exit critical section
      

      the setup is

      • 3 config server
      • 1 mongos
      • 3 shards, each with two member replication set
      • wiredTiger
      • all options default

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            rui.zhang Rui Zhang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: