[SERVER-16647] Invariant Failure in SplitChunkCommand::run() Created: 23/Dec/14  Updated: 23/Dec/14  Resolved: 23/Dec/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.8.0-rc3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adam Midvidy Assignee: Spencer Brody (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

4x EC2 Ubuntu1404 m3.large


Attachments: Text File configsvr-B.log     Text File configsvr-C.log     Text File configsvr-D.log     Text File server-A-mongos.log     Text File shard-B.log     Text File shard-C.log     Text File shard-D-crashed.log    
Issue Links:
Duplicate
duplicates SERVER-16498 d_migrate.cpp should not rely on syst... Closed
Operating System: ALL
Steps To Reproduce:

run this script in a loop on the mongos on server A after adding server's B,C,D as shards:

db.getSiblingDB("benchdb1").dropDatabase();                                                                                                                                                                     
sh.enableSharding("benchdb1");                                                                                                                                                                                  
for (var i = 0; i < 64; i++) {                                                                                                                                                                                  
  sh.shardCollection("benchdb1.COL-" + i, {"shardkey": "hashed"});                                                                                                                                              
}                                                                                                                                                                                                               

See this github repo for details (Configuration C) https://github.com/amidvidy/mongorestore-benchmarks

Participants:

 Description   

This happened on a 4 node cluster while running a benchmark for TOOLS-348.

Cluster topology:
Server A: mongos + mongorestore
Server B,C,D: mongod (standalone shard) + mongod (config server)

The mongod on server D crashes with:

2014-12-23T17:44:29.556+0000 I SHARDING [conn1] distributed lock 'benchdb1.COL-1/ip-10-238-44-130:27018:1419353520:1363056040' acquired, ts : 5499a9fc9a8f659f34bfba11                                 [16/1980]
2014-12-23T17:44:29.556+0000 I SHARDING [conn1] remotely refreshing metadata for benchdb1.COL-1 based on current shard version 1|2||5499a9f700df49298419049a, current metadata version is 1|2||5499a9f700df49298
419049a
2014-12-23T17:44:29.557+0000 I SHARDING [conn1] metadata of collection benchdb1.COL-1 already up to date (shard version : 1|2||5499a9f700df49298419049a, took 0ms)
2014-12-23T17:44:29.557+0000 I SHARDING [conn1] splitChunk accepted at version 1|2||5499a9f700df49298419049a
2014-12-23T17:44:29.864+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-238-44-130-2014-12-23T17:44:29-5499a9fd9a8f659f34bfba12", server: "ip-10-238-44-130", clientAddr: "10.233.133.124:414
19", time: new Date(1419356669864), what: "split", ns: "benchdb1.COL-1", details: { before: { min: { shardkey: MinKey }, max: { shardkey: -3074457345618258602 } }, left: { min: { shardkey: MinKey }, max: { sh
ardkey: -6148914691236517204 }, lastmod: Timestamp 1000|3, lastmodEpoch: ObjectId('5499a9f700df49298419049a') }, right: { min: { shardkey: -6148914691236517204 }, max: { shardkey: -3074457345618258602 }, last
mod: Timestamp 1000|4, lastmodEpoch: ObjectId('5499a9f700df49298419049a') } } }
2014-12-23T17:44:29.961+0000 I -        [conn1] Invariant failure collection src/mongo/s/d_split.cpp 842
2014-12-23T17:44:29.984+0000 I CONTROL  [conn1] 
 0xf0bd99 0xeb5bb1 0xe9b312 0xdb85dd 0x9a9784 0x9aa5d3 0x9ab08b 0xb77c2a 0xa8af55 0x7e1770 0xec9d61 0x7f94a3128182 0x7f94a2228fbd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B0BD99"},{"b":"400000","o":"AB5BB1"},{"b":"400000","o":"A9B312"},{"b":"400000","o":"9B85DD"},{"b":"400000","o":"5A9784"},{"b":"400000","o":"5AA5D3"},{"b":"400000","o":"5AB08B"
},{"b":"400000","o":"777C2A"},{"b":"400000","o":"68AF55"},{"b":"400000","o":"3E1770"},{"b":"400000","o":"AC9D61"},{"b":"7F94A3120000","o":"8182"},{"b":"7F94A212E000","o":"FAFBD"}],"processInfo":{ "mongodbVers
ion" : "2.8.0-rc3", "gitVersion" : "2d679247f17dab05a492c8b6d2c250dab18e54f2", "uname" : { "sysname" : "Linux", "release" : "3.13.0-36-generic", "version" : "#63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014", "mach
ine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFDE3EA000", "elfType" : 3 }, { "b" : "7F94A3120000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7
F94A2F18000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F94A2D14000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F94A2A10000", "path" : "/usr/lib/x86
_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F94A270A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F94A24F4000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "el
fType" : 3 }, { "b" : "7F94A212E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F94A333E000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf0bd99]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeb5bb1]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xB2) [0xe9b312]
 mongod(_ZN5mongo17SplitChunkCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x311D) [0xdb85dd]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9a9784]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9aa5d3]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9ab08b]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x76A) [0xb77c2a]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB25) [0xa8af55]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e1770]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x411) [0xec9d61]
 libpthread.so.0(+0x8182) [0x7f94a3128182]
 libc.so.6(clone+0x6D) [0x7f94a2228fbd]
-----  END BACKTRACE  -----
2014-12-23T17:44:29.984+0000 I -        [conn1] 
 
***aborting after invariant() failure



 Comments   
Comment by Adam Midvidy [ 23/Dec/14 ]

This invariant does not appear to be triggered using an mci build from githash 0e9f78133f2bbff5b215a106eba64ec98cfecf98 (first activated build after the fix for SERVER-16498 was pushed). When/If the benchmark fully completes without incident I will close out this ticket.

EDIT: closing. ran in to another issue though

Comment by Daniel Pasette (Inactive) [ 23/Dec/14 ]

can you see if the fix made for SERVER-16498 solves the problem?

Comment by Adam Midvidy [ 23/Dec/14 ]

added logs for mongoS, config servers, and shards. Note that the crash occured on the shard on server D.

Generated at Thu Feb 08 03:41:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.