[SERVER-24358] mongodb removeShard wedged if computed average record size is zero Created: 02/Jun/16  Updated: 06/Dec/22  Resolved: 19/Dec/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.7, 3.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dan Doyle Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding
Operating System: ALL
Participants:

 Description   

We have a setup currently with 3x shards that we are trying to consolidate down to 1x shard. After reviewing the documentation we started up the process and it seemed to go fine and spent the next several days draining the chunks.

It has been stuck for a long time at this point:

db.getSiblingDB('admin').runCommand( { removeShard: "shard0001" } );
{
  "msg": "draining ongoing",
  "state": "ongoing",
  "remaining": {
    "chunks": NumberLong("8"),
    "dbs": NumberLong("0")
  },
  "ok": 1
}

After a lot of reading it would seem that this is because some of the chunks are indivisible or too big. Looking at the output of

db.chunks.find({shard: "shard0001"})

shows that they are all marked as jumbo, which makes sense from the docs. However, running this shows that they aren't big, just maybe indivisible:

interface> db.getSiblingDB("config").chunks.find({ns : "interface.data_3600", shard: "shard0001"}).forEach(function(chunk) {         var ds = db.getSiblingDB("interface.data_3600", ).runCommand({datasize:chunk.ns,keyPattern:key,min:chunk.min,max:chunk.max});         print("Chunk: "+chunk._id +" has a size of "+ds.size+", and
includes "+ds.numObjects+" objects (took "+ds.millis+"ms)")     } );

Output:

Chunk: interface.data_3600-identifier_"eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7"start_1443600000end_1447200000 has a size of 0, and includes 0 objects (took 364ms)
Chunk: interface.data_3600-identifier_"f3bd3220e046b1093b1c976edf4bbdd0a80e11d3a1c08af483654e1929a4617b"start_1447200000end_1450800000 has a size of 0, and includes 0 objects (took 67ms)
Chunk: interface.data_3600-identifier_"f58989626ca054a08a89291eb888ad55997cc44ede5a55bf2723f2a9a6d7dadf"start_1447200000end_1450800000 has a size of 1319889, and includes 4 objects (took 290ms)
Chunk: interface.data_3600-identifier_"f608484cf555af6a1dcd6ef85e24c0b4ca9880b2c8fd459beedd9a6b158bac31"start_1447200000end_1450800000 has a size of 1715344, and includes 5 objects (took 193ms)
Chunk: interface.data_3600-identifier_"f85d1cb52fbda215a828f994aa7da465892c9aa9d6a423d3555797b9a394258e"start_1447200000end_1450800000 has a size of 1888620, and includes 4 objects (took 102ms)
Chunk: interface.data_3600-identifier_"fc89195d311b61c3d38f38ee3d29dcb111a4a4752ca460eb07c305220f45a720"start_1450800000end_1454400000 has a size of 0, and includes 0 objects (took 0ms)
Chunk: interface.data_3600-identifier_"ff26d379eb71a09c56438310289ad4a6556e7e4e4f3b6a14f1a04411c599ae66"start_1447200000end_1450800000 has a size of 1869036, and includes 4 objects (took 70ms)
Chunk: interface.data_3600-identifier_"ffff18518b122a12d14aa6c229c96fb824ae0a04328a80efeb87585deda6920b"start_1450800000end_1454400000 has a size of 1525992, and includes 4 objects (took 49ms)

These all seem to be empty or nearly empty chunks. There was a period where we did a lot of deletions of data, so this may be left over from that. I cannot seem to get these to move, however, and finish the draining.

Trying to perform a manual move resulted in the following:

{
 
  "cause": {
    "chunkTooBig": true,
    "estimatedChunkSize": NumberLong("169236184162472618"),
    "ok": 0,
    "errmsg": "chunk too big to move"
  },
  "ok": 0,
  "errmsg": "move failed"
}

After some more reading, someone suggested deleted the documents in the chunk, clearing the jumbo flag manually and then moving it. Doing that and running a manual move now shows:

sh.moveChunk("interface.data_3600", {"identifier": "f3bd3220e046b1093b1c976edf4bbdd0a80e11d3a1c08af483654e1929a4617b", "start": NumberLong("1447200000"), "end": NumberLong("1450800000")}, "shard0000");
{
  "cause": {
    "ok": 0,
    "errmsg": "could not acquire collection lock for interface.data_3600 to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : MinKey }, { : MaxKey }) in interface.data_3600 is taken."
  },
  "ok": 0,
  "errmsg": "move failed"
}

The lock in question seems to be perpetually stuck, and the regular balancer is always logging the same thing. In the interest of debugging I tried manually clearing the lock and running the command, at which point the migrate FROM shard segfaulted with this:

2016-06-02T13:41:01.534+0000 I SHARDING [conn3] received moveChunk request: { moveChunk: "interface.data_3600", from: "tsds.bldc.grnoc.iu.edu:27026", to: "tsds.bldc.grnoc.iu.edu:27025", fromShard: "shard0001", toShard: "shard0000", min: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 14
43600000, end: 1447200000 }, max: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1447200000, end: 1450800000 }, maxChunkSizeBytes: 268435456, shardId: "interface.data_3600-identifier_"eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7"start_1443600000end_1447200000", conf
igdb: "tsds.bldc.grnoc.iu.edu:27019,tsds.bldc.grnoc.iu.edu:27020,tsds.bldc.grnoc.iu.edu:27021", secondaryThrottle: true, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('5592cb956e74ba75920e0bda') }
2016-06-02T13:41:01.797+0000 I SHARDING [conn3] distributed lock 'interface.data_3600/tsds.bldc.grnoc.iu.edu:27026:1464874713:1439290629' acquired, ts : 5750376d7a625e027e304639
2016-06-02T13:41:01.797+0000 I SHARDING [conn3] about to log metadata event: { _id: "tsds.bldc.grnoc.iu.edu-2016-06-02T13:41:01-5750376d7a625e027e30463a", server: "tsds.bldc.grnoc.iu.edu", clientAddr: "140.182.45.176:52095", time: new Date(1464874861797), what: "moveChunk.start", ns: "interface.data_3600", details: { min: { id
entifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1443600000, end: 1447200000 }, max: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1447200000, end: 1450800000 }, from: "shard0001", to: "shard0000" } }
2016-06-02T13:41:01.799+0000 I SHARDING [conn3] remotely refreshing metadata for interface.data_3600, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
2016-06-02T13:41:01.822+0000 I SHARDING [conn3] collection interface.data_3600 was previously unsharded, new metadata loaded with shard version 4642|1||5592cb956e74ba75920e0bda
2016-06-02T13:41:01.822+0000 I SHARDING [conn3] collection version was loaded at version 4642|1||5592cb956e74ba75920e0bda, took 23ms
2016-06-02T13:41:01.822+0000 I SHARDING [conn3] moveChunk request accepted at version 4642|1||5592cb956e74ba75920e0bda
2016-06-02T13:41:01.822+0000 F -        [conn3] Invalid operation at address: 0xdfa98e
2016-06-02T13:41:01.828+0000 F -        [conn3] Got signal: 8 (Floating point exception).
 
 0xf758c9 0xf74f42 0xf7529e 0x3fda60f7e0 0xdfa98e 0xdfcef4 0x9cf824 0x9d07ad 0x9d14bb 0xba2f4a 0xab4830 0x7fbcfd 0xf26b9b 0x3fda607aa1 0x3fda2e893d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B758C9"},{"b":"400000","o":"B74F42"},{"b":"400000","o":"B7529E"},{"b":"3FDA600000","o":"F7E0"},{"b":"400000","o":"9FA98E"},{"b":"400000","o":"9FCEF4"},{"b":"400000","o":"5CF824"},{"b":"400000","o":"5D07AD"},{"b":"400000","o":"5D14BB"},{"b":"400000","o":"7A2F4A"},{"b":"400000","o":"6B4830"},{"b"
:"400000","o":"3FBCFD"},{"b":"400000","o":"B26B9B"},{"b":"3FDA600000","o":"7AA1"},{"b":"3FDA200000","o":"E893D"}],"processInfo":{ "mongodbVersion" : "3.0.7", "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.23.4.el6.x86_64", "version" : "#1 SMP Tue Jun 9 20:57:
37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E359E52F4A5E2A16804BCA34965CC29081397CDF" }, { "b" : "7FFF3C0CD000", "elfType" : 3, "buildId" : "FF4CBAAE51A93124ED31C2B1386CE92FF24AEBC3" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C56DD1B811FC0D9263248EBB3
08C73FCBCD80FC1" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "40BEA6554E64FC0C3D5C7D0CD91362730515102F" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "FC4EFD7502ACB3B9D213D28272D15A165857AD5A" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "95159178F1A4A3DBDC7819F
BEA2C80E5FCDD6BAC" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "29B61382141595ECBA6576232E44F2310C3AAB72" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "F07F2E7CF4BFB393CC9BBE8CDC6463652E14DB07" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "989FE3A42CA8CEBDCC185A743896F
23A0CF537ED" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8E6FA4C4B0594C355C1B90C1D49990368C81A040" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "959C5E10A47EE8A633E7681B64B4B9
F74E242ED5" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "54BA6B78A9220344E77463947215E42F0EABCC62" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6797403AA5F8FAD8ADFF683478B45F528CE4FB0E" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8CE28F280150E62296240E70ECA
C64E4A57AB826" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "05733977F4E41652B86070B27A0CFC2C1EA7719D" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E3FA235F3BA3F776A01A18ECA7
37C9890F445923" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "C39D7FFB49DFB1B55AD09D1D711AD802123F6623" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "E6798A06BEE17CF102BBA44F
D512FF8B805CEAF1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf758c9]
 mongod(+0xB74F42) [0xf74f42]
 mongod(+0xB7529E) [0xf7529e]
 libpthread.so.0(+0xF7E0) [0x3fda60f7e0]
 mongod(_ZN5mongo17MigrateFromStatus16storeCurrentLocsEPNS_16OperationContextExRSsRNS_14BSONObjBuilderE+0x36E) [0xdfa98e]
 mongod(_ZN5mongo16MoveChunkCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1CD4) [0xdfcef4]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9cf824]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9d07ad]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9d14bb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x77A) [0xba2f4a]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xab4830]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x7fbcfd]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf26b9b]
 libpthread.so.0(+0x7AA1) [0x3fda607aa1]
 libc.so.6(clone+0x6D) [0x3fda2e893d]
-----  END BACKTRACE  -----

Upon restart of the process, the lock goes back into a perpetually locked state again.

2016-06-02T13:59:58.908+0000 W SHARDING [conn3] could not acquire collection lock for interface.data_3600 to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : MinKey }, { : MaxKey }) in interface.data_3600 is taken

All processes involved have been restarted in case something was in a weird state that would be resolved by that. Our environment is mongodb 3.0.7 on CentOS6 64bit. At this point upgrading is not a possibility due to driver concerns, and we are trying to avoid big downtimes as it is a production service. The data set is somewhat large (> 1 TB).

Any help would be appreciated. If I can provide anything else I am more than happy to.



 Comments   
Comment by Sheeri Cabral (Inactive) [ 19/Dec/19 ]

This code path is gone.

Comment by Andy Schwerin [ 13/Jun/16 ]

I think we know enough, daldoyle. Thanks.

Comment by Dan Doyle [ 09/Jun/16 ]

Andy,

Success! I was able to run the collection.validate() method. It took a while to run but afterwards the regular balancer finished moving the last 8 chunks and I was able to successfully take the shard out of the cluster.

As to the "how this happened" or what the behavior ought to be in such circumstances, I will leave up to you. If there is other information I can provide to help debug or reproduce or anything, please let me know.

Thanks very much for your assistance.

Comment by Andy Schwerin [ 09/Jun/16 ]

OK. I suspect that what happened was that the estimates of the sizes of collections and number of documents in collections got out of sync, perhaps during one of the prior crashes, and that is tickling a divide-by-zero in some code that isn't prepared for them to be out of sync. You might be able to work around it by running validate on the collection in question on the FROM shard, but I haven't built a repro, yet, so I don't know for certain.

Comment by Dan Doyle [ 09/Jun/16 ]

Hi Andy,

All data shards are configured with:

storage:
engine: "wiredTiger"

Comment by Andy Schwerin [ 08/Jun/16 ]

The crash is happening because the collection believes that the total data size in bytes is less than the total number of documents in the collection As a result,
it computes an average record size of 0 bytes, and then divides by zero while trying to determine the maximum allowed number of records per chunk.

What's your storage engine?

Comment by Dan Doyle [ 08/Jun/16 ]

Ramon,

Thank you for the response, I appreciate it. I'm afraid I don't follow - is support for the SERVER project a paid service? Or is there something else?

Based on the advice, I have tried lowering the chunksize down to 1 and manually clearing the jumbo flag, but I'm afraid there does not appear to be a difference. Watching the log of the draining shard, I see the same as before which is a lot of:

140.182.45.176:34763", time: new Date(1465417633934), what: "moveChunk.from", ns: "interface.data_3600", details: { min: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1443600000, end: 1447200000 }, max: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1447200000, end: 1450800000 }, step 1 of 6: 0, to: "shard0000", from: "shard0001", note: "aborted", errmsg: "could not acquire collection lock for interface.data_3600 to migrate chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for migrating chunk [{ : M..." } }

so it does seem to be trying periodically. Eventually it gets to (from previous crash):

2016-06-08T03:38:17.077+0000 I SHARDING [conn2] forcing lock 'interface.data_3600/tsds.bldc.grnoc.iu.edu:27026:1465355177:373371687' because elapsed time 905631 > takeover time 900000
2016-06-08T03:38:17.199+0000 I SHARDING [conn2] lock 'interface.data_3600/tsds.bldc.grnoc.iu.edu:27026:1465355177:373371687' successfully forced
2016-06-08T03:38:17.472+0000 I SHARDING [conn2] distributed lock 'interface.data_3600/tsds.bldc.grnoc.iu.edu:27026:1465356191:1037486102' acquired, ts : 575793294a77a2816b78676c
2016-06-08T03:38:17.472+0000 I SHARDING [conn2] about to log metadata event: { _id: "tsds.bldc.grnoc.iu.edu-2016-06-08T03:38:17-575793294a77a2816b78676d", server: "tsds.bldc.grnoc.iu.edu", clientAddr: "140.182.45.176:52312", time: new Date(1465357097472), what: "moveChunk.start", ns: "interface.data_3600", details: { min: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1443600000, end: 1447200000 }, max: { identifier: "eb680bf0edff481d8029cda660b46438157ec1af94850abf62bda93a6989f0c7", start: 1447200000, end: 1450800000 }, from: "shard0001", to: "shard0000" } }
2016-06-08T03:38:17.474+0000 I SHARDING [conn2] remotely refreshing metadata for interface.data_3600, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
2016-06-08T03:38:17.498+0000 I SHARDING [conn2] collection interface.data_3600 was previously unsharded, new metadata loaded with shard version 4642|1||5592cb956e74ba75920e0bda
2016-06-08T03:38:17.498+0000 I SHARDING [conn2] collection version was loaded at version 4642|1||5592cb956e74ba75920e0bda, took 23ms
2016-06-08T03:38:17.498+0000 I SHARDING [conn2] moveChunk request accepted at version 4642|1||5592cb956e74ba75920e0bda
2016-06-08T03:38:17.498+0000 F -        [conn2] Invalid operation at address: 0xdfa98e
2016-06-08T03:38:17.504+0000 F -        [conn2] Got signal: 8 (Floating point exception).
 
 0xf758c9 0xf74f42 0xf7529e 0x3fda60f7e0 0xdfa98e 0xdfcef4 0x9cf824 0x9d07ad 0x9d14bb 0xba2f4a 0xab4830 0x7fbcfd 0xf26b9b 0x3fda607aa1 0x3fda2e893d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B758C9"},{"b":"400000","o":"B74F42"},{"b":"400000","o":"B7529E"},{"b":"3FDA600000","o":"F7E0"},{"b":"400000","o":"9FA98E"},{"b":"400000","o":"9FCEF4"},{"b":"400000","o"\
:"5CF824"},{"b":"400000","o":"5D07AD"},{"b":"400000","o":"5D14BB"},{"b":"400000","o":"7A2F4A"},{"b":"400000","o":"6B4830"},{"b":"400000","o":"3FBCFD"},{"b":"400000","o":"B26B9B"},{"b":"3FDA600000","o":\
"7AA1"},{"b":"3FDA200000","o":"E893D"}],"processInfo":{ "mongodbVersion" : "3.0.7", "gitVersion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd", "uname" : { "sysname" : "Linux", "release" : "2.6.32-504.2\
3.4.el6.x86_64", "version" : "#1 SMP Tue Jun 9 20:57:37 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "E359E52F4A5E2A16804BCA34965CC29081397CDF" }, { "b" :\
 "7FFF77233000", "elfType" : 3, "buildId" : "FF4CBAAE51A93124ED31C2B1386CE92FF24AEBC3" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C56DD1B811FC0D9263248EBB308C73FCBCD80FC1" }, {\
 "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "40BEA6554E64FC0C3D5C7D0CD91362730515102F" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "FC4EFD7502ACB3B9D213D2\
8272D15A165857AD5A" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "95159178F1A4A3DBDC7819FBEA2C80E5FCDD6BAC" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "29B6138214159\
5ECBA6576232E44F2310C3AAB72" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "F07F2E7CF4BFB393CC9BBE8CDC6463652E14DB07" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" \
: "989FE3A42CA8CEBDCC185A743896F23A0CF537ED" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "246C3BAB0AB093AFD59D34C8CBF29E786DE4BE97" }, { "path" : "/lib64/libc.so.6", "elfType" : 3,\
 "buildId" : "8E6FA4C4B0594C355C1B90C1D49990368C81A040" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "959C5E10A47EE8A633E7681B64B4B9F74E242ED5" }, { "path" : "/lib64/libgssap\
i_krb5.so.2", "elfType" : 3, "buildId" : "54BA6B78A9220344E77463947215E42F0EABCC62" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6797403AA5F8FAD8ADFF683478B45F528CE4FB0E" }, { "path\
" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8CE28F280150E62296240E70ECAC64E4A57AB826" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "05733977F4E41652B86070B27A0CFC2C\
1EA7719D" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E3FA235F3BA3F77\
6A01A18ECA737C9890F445923" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId"\
 : "C39D7FFB49DFB1B55AD09D1D711AD802123F6623" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "E6798A06BEE17CF102BBA44FD512FF8B805CEAF1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf758c9]
 mongod(+0xB74F42) [0xf74f42]
 mongod(+0xB7529E) [0xf7529e]
 libpthread.so.0(+0xF7E0) [0x3fda60f7e0]
 mongod(_ZN5mongo17MigrateFromStatus16storeCurrentLocsEPNS_16OperationContextExRSsRNS_14BSONObjBuilderE+0x36E) [0xdfa98e]
 mongod(_ZN5mongo16MoveChunkCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x1CD4) [0xdfcef4]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9cf824]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9d07ad]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9d14bb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x77A) [0xba2f4a]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xab4830]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x7fbcfd]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf26b9b]
 libpthread.so.0(+0x7AA1) [0x3fda607aa1]
 libc.so.6(clone+0x6D) [0x3fda2e893d]
-----  END BACKTRACE  -----

Comment by Ramon Fernandez Marina [ 08/Jun/16 ]

daldoyle, unfortunately we can't provide support on the SERVER project, but we're investigation this behavior to determine if there's a bug in the product.

One possible workaround could be to clear the jumbo flag and lower the chunksize.

Generated at Thu Feb 08 04:06:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.