Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49888

splitChunk command fails periodically

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 3.6.14
    • Fix Version/s: Backlog
    • Component/s: None
    • Operating System:
      ALL

      Description

      The user reports seeing the following error from a splitChunk command at roughly 3:30 every day:

      2020-07-20T03:37:45.562+0800 I SHARDING [conn8293] Split chunk { splitChunk: "expMonitordb.taking_order_detail_history", from: "shard1", keyPattern:
       
      { orderLogisticsCode: "hashed" }
      , epoch: ObjectId('5e957f10024dd8e62d233a29'), shardVersion: [ Timestamp(1340, 2338), ObjectId('5e957f10024dd8e62d233a29') ], min:
       
      { orderLogisticsCode: -7439110273655194270 }
      , max:
       
      { orderLogisticsCode: -7434037861704948650 }
      , splitKeys: [
       
      { orderLogisticsCode: -7436857285774910351 }
      ,
       
      { orderLogisticsCode: -7434584739396510438 }
      ] } failed :: caused by :: BadValue: chunk operation commit failed: version 1340|2411||5e957f10024dd8e62d233a29 doesn't exist in namespace: expMonitordb.taking_order_detail_history. Unable to save chunk ops. Command: { applyOps: [ { op: "u", b: true, ns: "config.chunks", o: { id: "expMonitordb.taking_order_detail_history-orderLogisticsCode-7439110273655194270", lastmod: Timestamp(1340, 2409), lastmodEpoch: ObjectId('5e957f10024dd8e62d233a29'), ns: "expMonitordb.taking_order_detail_history", min:
       
      { orderLogisticsCode: -7439110273655194270 }
      , max:
       
      { orderLogisticsCode: -7436857285774910351 }
      , shard: "shard1" }, o2:
       
      { _id: "expMonitordb.taking_order_detail_history-orderLogisticsCode_-7439110273655194270" }
      }, { op: "u", b: true, ns: "config.chunks", o: { id: "expMonitordb.taking_order_detail_history-orderLogisticsCode-7436857285774910351", lastmod: Timestamp(1340, 2410), lastmodEpoch: ObjectId('5e957f10024dd8e62d233a29'), ns: "expMonitordb.taking_order_detail_history", min:
       
      { orderLogisticsCode: -7436857285774910351 }
      , max:
       
      { orderLogisticsCode: -7434584739396510438 }
      , shard: "shard1" }, o2:
       
      { _id: "expMonitordb.taking_order_detail_history-orderLogisticsCode_-7436857285774910351" }
      }, { op: "u", b: true, ns: "config.chunks", o: { id: "expMonitordb.taking_order_detail_history-orderLogisticsCode-7434584739396510438", lastmod: Timestamp(1340, 2411), lastmodEpoch: ObjectId('5e957f10024dd8e62d233a29'), ns: "expMonitordb.taking_order_detail_history", min:
       
      { orderLogisticsCode: -7434584739396510438 }
      , max:
       
      { orderLogisticsCode: -7434037861704948650 }
      , shard: "shard1" }, o2:
       
      { _id: "expMonitordb.taking_order_detail_history-orderLogisticsCode_-7434584739396510438" }
      } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "expMonitordb.taking_order_detail_history", min:
       
      { orderLogisticsCode: -7439110273655194270 }
      , max:
       
      { orderLogisticsCode: -7434037861704948650 }
      }, orderby:
       
      { lastmod: -1 }
      }, res:
       
      { lastmodEpoch: ObjectId('5e957f10024dd8e62d233a29'), shard: "shard1" }
      } ], writeConcern:
       
      { w: 0, wtimeout: 0 }
      }. Result: { got: {}, whatFailed: { ns: "config.chunks", q: { query: { ns: "expMonitordb.taking_order_detail_history", min:
       
      { orderLogisticsCode: -7439110273655194270 }
      , max:
       
      { orderLogisticsCode: -7434037861704948650 }
      }, orderby:
       
      { lastmod: -1 }
      }, res:
       
      { lastmodEpoch: ObjectId('5e957f10024dd8e62d233a29'), shard: "shard1" }
      }, ok: 0.0, errmsg: "preCondition failed", code: 2, codeName: "BadValue", operationTime: Timestamp(1595187465, 1458), $gleStats: { lastOpTime:
       
      { ts: Timestamp(1595187465, 1458), t: 3 }
      , electionId: ObjectId('7fffffff0000000000000003') }, $clusterTime: { clusterTime: Timestamp(1595187465, 1459), signature:
       
      { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 }
      } } :: caused by :: preCondition failed
      

      In one instance mongos also crashed with the following error. This is a case of SERVER-27796.

      2020-07-25T03:29:48.811+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-20-0] Connecting to mongo9.prd.db:21041
      2020-07-25T03:29:48.816+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-20-0] Successfully connected to mongo9.prd.db:21041, took 5ms (1 connections now open to mongo9.prd.db:21041)
      2020-07-25T03:29:48.825+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-23-0] Connecting to mongo9.prd.db:21041
      2020-07-25T03:29:48.831+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-23-0] Successfully connected to mongo9.prd.db:21041, took 6ms (1 connections now open to mongo9.prd.db:21041)
      2020-07-25T03:29:48.832+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-27-0] Connecting to mongo10.prd.db:21041
      2020-07-25T03:29:48.837+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-27-0] Successfully connected to mongo10.prd.db:21041, took 5ms (1 connections now open to mongo10.prd.db:21041)
      2020-07-25T03:30:00.913+0800 F - [conn8293] Invariant failure numDeleted == 1 src/mongo/s/query/cluster_cursor_manager.cpp 626
      2020-07-25T03:30:00.913+0800 F - [conn8293]
       
      ***aborting after invariant() failure
       
       
      2020-07-25T03:30:01.016+0800 F - [conn8293] Got signal: 6 (Aborted).
       
      0x7f6f6cfd6d01 0x7f6f6cfd5f19 0x7f6f6cfd63fd 0x7f6f6b0417e0 0x7f6f6acd04f5 0x7f6f6acd1cd5 0x7f6f6c427818 0x7f6f6c7d75b8 0x7f6f6c7d7816 0x7f6f6c7d7a5d 0x7f6f6c57a053 0x7f6f6c502ed6 0x7f6f6c922376 0x7f6f6c91d7df 0x7f6f6c556335 0x7f6f6c5573c3 0x7f6f6c557aa9 0x7f6f6c476591 0x7f6f6c494aba 0x7f6f6c490417 0x7f6f6c4938a1 0x7f6f6c8f62b2 0x7f6f6c48f250 0x7f6f6c4917e5 0x7f6f6c4920e1 0x7f6f6c49049d 0x7f6f6c4938a1 0x7f6f6c8f6815 0x7f6f6ce99104 0x7f6f6b039aa1 0x7f6f6ad86c4d
      ----- BEGIN BACKTRACE -----
      {"backtrace":[\{"b":"7F6F6BF32000","o":"10A4D01","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"7F6F6BF32000","o":"10A3F19"},\{"b":"7F6F6BF32000","o":"10A43FD"},\{"b":"7F6F6B032000","o":"F7E0"},\{"b":"7F6F6AC9E000","o":"324F5","s":"gsignal"},\{"b":"7F6F6AC9E000","o":"33CD5","s":"abort"},\{"b":"7F6F6BF32000","o":"4F5818","s":"_ZN5mongo22invariantFailedWithMsgEPKcS1_S1_j"},\{"b":"7F6F6BF32000","o":"8A55B8","s":"_ZN5mongo20ClusterCursorManager13_detachCursorENS_8WithLockERKNS_15NamespaceStringEx"},\{"b":"7F6F6BF32000","o":"8A5816","s":"_ZN5mongo20ClusterCursorManager13checkInCursorESt10unique_ptrINS_19ClusterClientCursorESt14default_deleteIS2_EERKNS_15NamespaceStringExNS0_11CursorStateE"},\{"b":"7F6F6BF32000","o":"8A5A5D","s":"_ZN5mongo20ClusterCursorManager12PinnedCursor12returnCursorENS0_11CursorStateE"},\{"b":"7F6F6BF32000","o":"648053","s":"_ZN5mongo11ClusterFind10runGetMoreEPNS_16OperationContextERKNS_14GetMoreRequestE"},\{"b":"7F6F6BF32000","o":"5D0ED6"},
      {"b":"7F6F6BF32000","o":"9F0376","s":"_ZN5mongo12BasicCommand11enhancedRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE"},\{"b":"7F6F6BF32000","o":"9EB7DF","s":"_ZN5mongo7Command9publicRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE"},\{"b":"7F6F6BF32000","o":"624335"},\{"b":"7F6F6BF32000","o":"6253C3"},\{"b":"7F6F6BF32000","o":"625AA9","s":"_ZN5mongo8Strategy13clientCommandEPNS_16OperationContextERKNS_7MessageE"},\{"b":"7F6F6BF32000","o":"544591","s":"_ZN5mongo23ServiceEntryPointMongos13handleRequestEPNS_16OperationContextERKNS_7MessageE"},\{"b":"7F6F6BF32000","o":"562ABA","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},\{"b":"7F6F6BF32000","o":"55E417","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},\{"b":"7F6F6BF32000","o":"5618A1"},\{"b":"7F6F6BF32000","o":"9C42B2","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},\{"b":"7F6F6BF32000","o":"55D250","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},\{"b":"7F6F6BF32000","o":"55F7E5","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},
      {"b":"7F6F6BF32000","o":"5600E1","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},\{"b":"7F6F6BF32000","o":"55E49D","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},\{"b":"7F6F6BF32000","o":"5618A1"},\{"b":"7F6F6BF32000","o":"9C4815"},\{"b":"7F6F6BF32000","o":"F67104"},\{"b":"7F6F6B032000","o":"7AA1"},\{"b":"7F6F6AC9E000","o":"E8C4D","s":"clone"}],"processInfo":\{ "mongodbVersion" : "3.6.14", "gitVersion" : "cbef87692475857c7ee6e764c8f5104b39c342a1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-696.el6.x86_64", "version" : "#1 SMP Tue Mar 21 19:29:05 UTC 2017", "machine" : "x86_64" }, "somap" : [ \{ "b" : "7F6F6BF32000", "elfType" : 3, "buildId" : "07B9BC06AF673FCB11769568CF4AD7B9B7755E15" }, \{ "b" : "7FFF5A8D5000", "elfType" : 3, "buildId" : "95F4CB6645DDB9D6C4A3AD4C63CE0C886B8BB9EE" }, \{ "b" : "7F30DB2F5000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "4786A2A5D30B121601958E84D643C70C13C4FBA5" }, \{ "b" : "7F30DC4ED000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "552CEC3216281CCFD7FA6432C723D50163255823" }, \{ "b" : "7F30DCAE9000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "2AF795BFFD122309BA3359FEBABB5D0967403D17" }, 
      { "b" : "7F30DBC65000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "4AAEE970B045D8BF946578B9C7F3AB5CDE9AB44A" }, \{ "b" : "7F30DA24F000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "EDC925E58FE28DCA536993EB13179C739F1E6566" }, \{ "b" : "7F30DC832000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "4EA475CD3FD3B69B6C95D9381FA74B36DB4992EF" }, \{ "b" : "7F30DC89E000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "3E5ABB69E7969FB2C80A7D3637D62395D6C3F827" }, \{ "b" : "7F30DDD0F000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "97AF4B77212F74CFF72B6C013E6AA2D74A97EF60" }, \{ "b" : "7F6F6AA90000", "path" : "/lib64/libnss_files.so.2", "elfType" : 3, "buildId" : "787208C787FA89627D16F3DE901184DF2D2C0373" }, \{ "b" : "7F6F6A88A000", "path" : "/lib64/libnss_dns.so.2", "elfType" : 3, "buildId" : "AEE048FC514B3B527D4CC6DDFA9656BE7E217893" } ] }}
       mongos(_ZN5mongo15printStackTraceERSo+0x41) [0x7f6f6cfd6d01]
       mongos(+0x10A3F19) [0x7f6f6cfd5f19]
       mongos(+0x10A43FD) [0x7f6f6cfd63fd]
       libpthread.so.0(+0xF7E0) [0x7f6f6b0417e0]
       libc.so.6(gsignal+0x35) [0x7f6f6acd04f5]
       libc.so.6(abort+0x175) [0x7f6f6acd1cd5]
       mongos(_ZN5mongo22invariantFailedWithMsgEPKcS1_S1_j+0x0) [0x7f6f6c427818]
       mongos(_ZN5mongo20ClusterCursorManager13_detachCursorENS_8WithLockERKNS_15NamespaceStringEx+0x218) [0x7f6f6c7d75b8]
       mongos(_ZN5mongo20ClusterCursorManager13checkInCursorESt10unique_ptrINS_19ClusterClientCursorESt14default_deleteIS2_EERKNS_15NamespaceStringExNS0_11CursorStateE+0x136) [0x7f6f6c7d7816]
       mongos(_ZN5mongo20ClusterCursorManager12PinnedCursor12returnCursorENS0_11CursorStateE+0x4D) [0x7f6f6c7d7a5d]
       mongos(_ZN5mongo11ClusterFind10runGetMoreEPNS_16OperationContextERKNS_14GetMoreRequestE+0x1C3) [0x7f6f6c57a053]
       mongos(+0x5D0ED6) [0x7f6f6c502ed6]
       mongos(_ZN5mongo12BasicCommand11enhancedRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x76) [0x7f6f6c922376]
       mongos(_ZN5mongo7Command9publicRunEPNS_16OperationContextERKNS_12OpMsgRequestERNS_14BSONObjBuilderE+0x1F) [0x7f6f6c91d7df]
       mongos(+0x624335) [0x7f6f6c556335]
       mongos(+0x6253C3) [0x7f6f6c5573c3]
       mongos(_ZN5mongo8Strategy13clientCommandEPNS_16OperationContextERKNS_7MessageE+0x59) [0x7f6f6c557aa9]
       mongos(_ZN5mongo23ServiceEntryPointMongos13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x5A1) [0x7f6f6c476591]
       mongos(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xBA) [0x7f6f6c494aba]
       mongos(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x97) [0x7f6f6c490417]
       mongos(+0x5618A1) [0x7f6f6c4938a1]
       mongos(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x1A2) [0x7f6f6c8f62b2]
       mongos(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x150) [0x7f6f6c48f250]
       mongos(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0xB05) [0x7f6f6c4917e5]
       mongos(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x241) [0x7f6f6c4920e1]
       mongos(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11D) [0x7f6f6c49049d]
       mongos(+0x5618A1) [0x7f6f6c4938a1]
       mongos(+0x9C4815) [0x7f6f6c8f6815]
       mongos(+0xF67104) [0x7f6f6ce99104]
       libpthread.so.0(+0x7AA1) [0x7f6f6b039aa1]
       libc.so.6(clone+0x6D) [0x7f6f6ad86c4d]
      ----- END BACKTRACE -----
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-sharding Backlog - Sharding Team
              Reporter:
              601290552@qq.com jing xu
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: