Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29310

server crash during chunk split

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.13
    • Component/s: Networking
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      I suspect this will happen again, (we have 9 different clusters) but right now I do not have a steps to reproduce the problem.

      Show
      I suspect this will happen again, (we have 9 different clusters) but right now I do not have a steps to reproduce the problem.

      4 days ago we upgrade from 3.0.7 to 3.2.13. We didn't have any crashes in a year but had 2 crashes on 2 different clusters since the upgrade. The first crash did not produce a stack trace but the second one did (please see attached log file). Here's the log from right before the crash and the stacktrace. It seems that the segmentation fault happened during a chunk split attempt.

      2017-05-21T09:13:28.466-0400 I SHARDING [conn3787] request split points lookup for chunk postingrecommendation.postingrecommendation { : "TN", : "5913264d50499b0bb4434b24" } -->> { : "TN", : "7f934f001357ce9c0eb72c05" }
      2017-05-21T09:13:28.515-0400 I SHARDING [conn3787] received splitChunk request: { splitChunk: "postingrecommendation.postingrecommendation", keyPattern: { _skp: 1.0, _id: 1.0 }, min: { _skp: "TN", _id: "5913264d50499b0bb4434b24" }, max: { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }, from: "jsra", splitKeys: [ { _skp: "TN", _id: "59185007638e290ba8e933a5" }, { _skp: "TN", _id: "591cff1650499b0bb44d70df" } ], shardId: "postingrecommendation.postingrecommendation-_skp_"TN"_id_"5913264d50499b0bb4434b24"", configdb: "mgocnf-a.snagprod.corp:27340,mgocnf-b.snagprod.corp:27340,mgocnf-c.snagprod.corp:27340", epoch: ObjectId('527abca8d31d1633acdaa97e') }
      2017-05-21T09:13:28.749-0400 I SHARDING [conn3787] distributed lock 'postingrecommendation.postingrecommendation/mgo-jsra-a.snagprod.corp:27017:1495120452:697408834' acquired for 'splitting chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" }, { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) in postingrecommendation.postingrecommendation', ts : 592192782e655bbf91dde53f
      2017-05-21T09:13:28.749-0400 I SHARDING [conn3787] remotely refreshing metadata for postingrecommendation.postingrecommendation based on current shard version 30|12824||527abca8d31d1633acdaa97e, current metadata version is 30|12824||527abca8d31d1633acdaa97e
      2017-05-21T09:13:28.751-0400 I SHARDING [conn3787] metadata of collection postingrecommendation.postingrecommendation already up to date (shard version : 30|12824||527abca8d31d1633acdaa97e, took 2 ms)
      2017-05-21T09:13:28.752-0400 W SHARDING [conn3787] splitChunk cannot find chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" },{ _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) to split, the chunk boundaries may be stale
      2017-05-21T09:13:28.850-0400 I SHARDING [conn3787] distributed lock 'postingrecommendation.postingrecommendation/mgo-jsra-a.snagprod.corp:27017:1495120452:697408834' unlocked.
      2017-05-21T09:13:28.850-0400 I COMMAND  [conn3787] command admin.$cmd command: splitChunk { splitChunk: "postingrecommendation.postingrecommendation", keyPattern: { _skp: 1.0, _id: 1.0 }, min: { _skp: "TN", _id: "5913264d50499b0bb4434b24" }, max: { _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }, from: "jsra", splitKeys: [ { _skp: "TN", _id: "59185007638e290ba8e933a5" }, { _skp: "TN", _id: "591cff1650499b0bb44d70df" } ], shardId: "postingrecommendation.postingrecommendation-_skp_"TN"_id_"5913264d50499b0bb4434b24"", configdb: "mgocnf-a.snagprod.corp:27340,mgocnf-b.snagprod.corp:27340,mgocnf-c.snagprod.corp:27340", epoch: ObjectId('527abca8d31d1633acdaa97e') } keyUpdates:0 writeConflicts:0 exception: splitChunk cannot find chunk [{ _skp: "TN", _id: "5913264d50499b0bb4434b24" },{ _skp: "TN", _id: "7f934f001357ce9c0eb72c05" }) to split, the chunk boundaries may be stale ( ns : postingrecommendation.postingrecommendation, received : 0|0||000000000000000000000000, wanted : 30|12824||527abca8d31d1633acdaa97e, send ) code:13388 numYields:0 reslen:496 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, MMAPV1Journal: { acquireCount: { w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { W: 1 } } } protocol:op_query 335ms
      2017-05-21T09:13:28.851-0400 I NETWORK  [conn3787] end connection 10.70.18.214:49658 (222 connections now open)
      2017-05-21T09:13:28.866-0400 F -        [thread1] Invalid access at address: 0xffffffffffffffe8
      2017-05-21T09:13:28.997-0400 F -        [thread1] Got signal: 11 (Segmentation fault).
      0x133f4f2 0x133e649 0x133e9c8 0x7f3fd19db330 0x1b514e9 0x1b51ba9 0xa111e9 0xa118b5 0x11f3c03 0x11f5e50 0x1418fc0 0x7f3fd19d2f82 0x7f3fd19d3197 0x7f3fd1700bed
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"F3F4F2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F3E649"},{"b":"400000","o":"F3E9C8"},{"b":"7F3FD19CB000","o":"10330"},{"b":"400000","o":"17514E9","s":"_ZNSo6sentryC2ERSo"},{"b":"400000","o":"1751BA9","s":"_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l"},{"b":"400000","o":"6111E9","s":"_ZN5mongo11PoolForHost4doneEPNS_16DBConnectionPoolEPNS_12DBClientBaseE"},{"b":"400000","o":"6118B5","s":"_ZN5mongo16DBConnectionPool7releaseERKSsPNS_12DBClientBaseE"},{"b":"400000","o":"DF3C03"},{"b":"400000","o":"DF5E50"},{"b":"400000","o":"1018FC0"},{"b":"7F3FD19CB000","o":"7F82"},{"b":"7F3FD19CB000","o":"8197"},{"b":"7F3FD1603000","o":"FDBED","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.13", "gitVersion" : "23899209cad60aaafe114f6aea6cb83025ff51bc", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-112-generic", "version" : "#159-Ubuntu SMP Fri Mar 3 15:26:07 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "B559BDA626A4B7F4A29153D8DA0DAA0B3B48A82B" }, { "b" : "7FFCB9DAB000", "elfType" : 3, "buildId" : "012E1338BA43AF7C0DC7D069F64F0A6490CC6D9C" }, { "b" : "7F3FD28ED000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "48A664AE6B0B4918A3EB0156C6364C4F084232FD" }, { "b" : "7F3FD2511000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "6B8997EA892A7FF37AC8CAA8F239D595251889BB" }, { "b" : "7F3FD2309000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "1EEBA762A6A2C8884D56033EE8CCE79B95CD974D" }, { "b" : "7F3FD2105000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D0F881E59FF88BE4F29A228C8657376B3C325C2C" }, { "b" : "7F3FD1DFF000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1654CB13B1D24ED03F4BDCB51FC7524B9181A771" }, { "b" : "7F3FD1BE9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F3FD19CB000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "22F9078CFA529CCE1A814A4A1A1C018F169D5652" }, { "b" : "7F3FD1603000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CA5C6CFE528AF541C3C2C15CEE4B3C74DA4E2FB4" }, { "b" : "7F3FD2B4C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "237E22E5AAC2DDFCD06518F63FD720FE758E6E5B" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x133f4f2]
       mongod(+0xF3E649) [0x133e649]
       mongod(+0xF3E9C8) [0x133e9c8]
       libpthread.so.0(+0x10330) [0x7f3fd19db330]
       mongod(_ZNSo6sentryC2ERSo+0x19) [0x1b514e9]
       mongod(_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l+0x29) [0x1b51ba9]
       mongod(_ZN5mongo11PoolForHost4doneEPNS_16DBConnectionPoolEPNS_12DBClientBaseE+0x109) [0xa111e9]
       mongod(_ZN5mongo16DBConnectionPool7releaseERKSsPNS_12DBClientBaseE+0xE5) [0xa118b5]
       mongod(+0xDF3C03) [0x11f3c03]
       mongod(+0xDF5E50) [0x11f5e50]
       mongod(+0x1018FC0) [0x1418fc0]
       libpthread.so.0(+0x7F82) [0x7f3fd19d2f82]
       libpthread.so.0(+0x8197) [0x7f3fd19d3197]
       libc.so.6(clone+0x6D) [0x7f3fd1700bed]
      -----  END BACKTRACE  -----
      

            Assignee:
            samantha.ritter@mongodb.com Samantha Ritter (Inactive)
            Reporter:
            rfehrmann@snagajob.com Robert Fehrmann
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: