[SERVER-10757] chunk migration stuck Created: 13/Sep/13  Updated: 11/Jul/16  Resolved: 04/Nov/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Idan Kamara Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu


Attachments: File mongoc.tar.gz     File mongoc.tar.gz     File mongos.log     File shard0000-log.gz     File shard0001-log    
Operating System: Linux
Participants:

 Description   

Setup: 3 config servers, 3 mongod and 3 mongos.

Last night I sharded 3 collections (hashed), one after the other. Documents seemed to move from the main shard to the others as time passed by. This morning, I checked if the chunk migrations have finished and it seemed like for the first two they have, but the third got stuck. I looked at 'sh.status()', and it showed this:

abc.ankara_extractor
        shard key: { "_id" : "hashed" }
        chunks:
                shard0000       35
                shard0001       31
                shard0002       32
 
abc.ankara_parser
        shard key: { "_id" : "hashed" }
        chunks:
                shard0000       49
                shard0002       39
                shard0001       39

Not perfectly balanced, but whatever.

Moving on to the third collection, nothing changed:

abc.jakarta_companies
        shard key: { "_id" : "hashed" }
        chunks:
                shard0000       143

At this point I stopped all mongos instances, and started just one. It showed this in the log:

Fri Sep 13 08:38:59.567 [Balancer]  ns: abc.ankara_parser going to move { _id: "abc.ankara_parser-_id_MinKey", lastmod: Timestamp 78000|1, lastmodEpoch: ObjectId('523220e66f82e5ddc385722d'), ns: "abc.ankara_parser", min: { _id: MinKey }, max: { _id: -9077305884563950534 }, shard: "shard0000" } from: shard0000 to: shard0001 tag []
Fri Sep 13 08:38:59.570 [Balancer]  ns: abc.jakarta_companies going to move { _id: "abc.jakarta_companies-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('523222126f82e5ddc3857253'), ns: "abc.jakarta_companies", min: { _id: MinKey }, max: { _id: -9094728535907602479 }, shard: "shard0000" } from: shard0000 to: shard0001 tag []
Fri Sep 13 08:38:59.571 [Balancer] moving chunk ns: abc.ankara_parser moving ( ns:abc.ankara_parsershard: shard0000:mongod1.abc.com:27018lastmod: 78|1||000000000000000000000000min: { _id: MinKey }max: { _id: -9077305884563950534 }) shard0000:mongod1.abc.com:27018 -> shard0001:mongod2.abc.com:27018
Fri Sep 13 08:38:59.656 [Balancer] moveChunk result: { who: { _id: "abc.ankara_parser", process: "ip-10-151-124-58:27018:1378994641:1905410576", state: 2, ts: ObjectId('5232610423a6193a37ce7a0a'), when: new Date(1379033348967), who: "ip-10-151-124-58:27018:1378994641:1905410576:conn751:1336024233", why: "migrate-{ _id: 7965667869890547469 }" }, ok: 0.0, errmsg: "the collection metadata could not be locked with lock migrate-{ _id: MinKey }" }
Fri Sep 13 08:38:59.657 [Balancer] balancer move failed: { who: { _id: "abc.ankara_parser", process: "ip-10-151-124-58:27018:1378994641:1905410576", state: 2, ts: ObjectId('5232610423a6193a37ce7a0a'), when: new Date(1379033348967), who: "ip-10-151-124-58:27018:1378994641:1905410576:conn751:1336024233", why: "migrate-{ _id: 7965667869890547469 }" }, ok: 0.0, errmsg: "the collection metadata could not be locked with lock migrate-{ _id: MinKey }" } from: shard0000 to: shard0001 chunk:  min: { _id: MinKey } max: { _id: -9077305884563950534 }
Fri Sep 13 08:38:59.657 [Balancer] moving chunk ns: abc.jakarta_companies moving ( ns:abc.jakarta_companiesshard: shard0000:mongod1.abc.com:27018lastmod: 1|0||000000000000000000000000min: { _id: MinKey }max: { _id: -9094728535907602479 }) shard0000:mongod1.abc.com:27018 -> shard0001:mongod2.abc.com:27018

Looking at jakarta_companies.count() in the 3 mongods, shows:

shard0000: 3895500
shard0001: 27257
shard0002: collection doesn't exist

and it's been stuck like this since then.

Other info that might be useful:

mongos> db.currentOp()                                                                                                                                                                                      
{
        "inprog" : [
                {
                        "opid" : "shard0000:88205450",
                        "active" : true,
                        "secs_running" : 4510,
                        "op" : "query",
                        "ns" : "abc.jakarta_companies",
                        "query" : {
                                "moveChunk" : "abc.jakarta_companies",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : { "$minKey" : 1 }
                                },
                                "max" : {
                                        "_id" : NumberLong("-9094728535907602479")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.jakarta_companies-_id_MinKey",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client_s" : "10.151.90.78:33638",
                        "desc" : "conn755",
                        "threadId" : "0x7e3fbb5ca700",
                        "connectionId" : 755,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 0,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(241859),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(6),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}

configsvr> db.locks.find()
{ "_id" : "configUpgrade", "process" : "ip-10-36-21-61:27017:1372080743:1804289383", "state" : 0, "ts" : ObjectId("51c84a67f095c43a2590802d"), "when" : ISODate("2013-06-24T13:32:23.851Z"), "who" : "ip-10-36-21-61:27017:1372080743:1804289383:mongosMain:846930886", "why" : "upgrading config database to new format v4" }
{ "_id" : "abc.scraped_website_resource", "process" : "ip-10-151-25-52:27018:1372842489:2092651466", "state" : 0, "ts" : ObjectId("51fe0eb2bd9a06e052152f3c"), "when" : ISODate("2013-08-04T08:20:02.526Z"), "who" : "ip-10-151-25-52:27018:1372842489:2092651466:conn15229:2046301240", "why" : "migrate-{ _id: MinKey }" }
{ "_id" : "abc.parsed_website_resource", "process" : "ip-10-190-202-133:27018:1378999289:1601254270", "state" : 0, "ts" : ObjectId("52321fdf0fe09cb3ce334466"), "when" : ISODate("2013-09-12T20:11:11.607Z"), "who" : "ip-10-190-202-133:27018:1378999289:1601254270:conn672:1819358977", "why" : "migrate-{ _id: -9221392181608325863 }" }
{ "_id" : "abc.website_catalog", "process" : "ip-10-190-202-133:27018:1372842483:2063823080", "state" : 0, "ts" : ObjectId("51f64ff6615675a179f8fbb1"), "when" : ISODate("2013-07-29T11:20:22.627Z"), "who" : "ip-10-190-202-133:27018:1372842483:2063823080:conn22035:1126107449", "why" : "migrate-{ _id: MinKey }" }
{ "_id" : "balancer", "process" : "ip-10-151-90-78:27017:1379061538:1804289383", "state" : 2, "ts" : ObjectId("5232cf22e84b73ca42025328"), "when" : ISODate("2013-09-13T08:38:58.442Z"), "who" : "ip-10-151-90-78:27017:1379061538:1804289383:Balancer:846930886", "why" : "doing balance round" }
{ "_id" : "abc.santiago_extractor", "process" : "ip-10-151-124-58:27018:1372842478:1247094076", "state" : 0, "ts" : ObjectId("51d97b45de466c4e9210a0d3"), "when" : ISODate("2013-07-07T14:29:25.296Z"), "who" : "ip-10-151-124-58:27018:1372842478:1247094076:conn16061:1645155834", "why" : "migrate-{ _id: MinKey }" }
{ "_id" : "abc.santiago_parser", "process" : "ip-10-190-202-133:27018:1372842483:2063823080", "state" : 0, "ts" : ObjectId("51d3ec4d615675a179f8ed66"), "when" : ISODate("2013-07-03T09:18:05.062Z"), "who" : "ip-10-190-202-133:27018:1372842483:2063823080:conn15192:1397128088", "why" : "split-{ _id: 6148914691236517204 }" }
{ "_id" : "abc.jakarta_contacts", "process" : "ip-10-151-124-58:27018:1372842478:1247094076", "state" : 0, "ts" : ObjectId("51fe51f1de466c4e9210c20a"), "when" : ISODate("2013-08-04T13:06:57.282Z"), "who" : "ip-10-151-124-58:27018:1372842478:1247094076:conn290924:508144576", "why" : "migrate-{ data.companyId: 2958941370527843639 }" }
{ "_id" : "abc.feature_results_test", "process" : "ip-10-151-2-120:27017:1377505487:1804289383", "state" : 0, "ts" : ObjectId("523217c68ff15d76d1012d9c"), "when" : ISODate("2013-09-12T19:36:38.624Z"), "who" : "ip-10-151-2-120:27017:1377505487:1804289383:conn412:1714636915", "why" : "drop" }
{ "_id" : "abc.ankara_extractor", "process" : "ip-10-151-124-58:27018:1378994641:1905410576", "state" : 0, "ts" : ObjectId("52325fbd23a6193a37ce7a02"), "when" : ISODate("2013-09-13T00:43:41.655Z"), "who" : "ip-10-151-124-58:27018:1378994641:1905410576:conn744:621698639", "why" : "migrate-{ _id: 9200307451568432088 }" }
{ "_id" : "abc.ankara_parser", "process" : "ip-10-151-124-58:27018:1378994641:1905410576", "state" : 0, "ts" : ObjectId("5232610423a6193a37ce7a0a"), "when" : ISODate("2013-09-13T00:49:08.967Z"), "who" : "ip-10-151-124-58:27018:1378994641:1905410576:conn751:1336024233", "why" : "migrate-{ _id: 7965667869890547469 }" }
{ "_id" : "abc.jakarta_companies", "process" : "ip-10-151-124-58:27018:1378994641:1905410576", "state" : 2, "ts" : ObjectId("5232cf1923a6193a37ce7a0f"), "when" : ISODate("2013-09-13T08:38:49.804Z"), "who" : "ip-10-151-124-58:27018:1378994641:1905410576:conn755:78850494", "why" : "migrate-{ _id: MinKey }" }

Since then I've tried restarting mongos, the config servers, restarting the balancer (sh.stop/startBalancer). Nothing causes the migration to resume.

How do I get out of this jam?



 Comments   
Comment by Idan Kamara [ 15/Sep/13 ]

The balancer seemed to recover, and chunk migration finished successfully. Still not sure what caused the initial halt two days ago.

Thanks for all your help!

Comment by Scott Hernandez (Inactive) [ 14/Sep/13 ]

It is normal for that to timeout if there is activity. It will not wait forever for the balancer to stop, and the timeout is many times shorter than the time it would take to finish if it is actively moving chunks in a balancing round.

As long as things are moving and getting balanced I wouldn't suggest you spend any time or concern on it.

Comment by Idan Kamara [ 14/Sep/13 ]

Ok so I let it rest during the night and have issued a 'sh.setBalancerState(true)' just now, and chunks seem to be moving for now. Will report back in a few hours.

Comment by Idan Kamara [ 13/Sep/13 ]

mongos> db.currentOp()
{ "inprog" : [ ] }

Comment by Scott Hernandez (Inactive) [ 13/Sep/13 ]

Can you post db.currentOp() information and logs from this mongos ("ip-10-151-90-78:27017") (who was the balancer last)?

Comment by Idan Kamara [ 13/Sep/13 ]

Attached. Nothing has happened since I restarted. Tried to stop the balancer and it said:

mongos> sh.stopBalancer()
Waiting for active hosts...
Waiting for active host ip-10-236-148-221:27017 to recognize new settings... (ping : Fri Sep 13 2013 08:38:18 GMT+0000 (UTC))
Waited for active ping to change for host ip-10-236-148-221:27017, a migration may be in progress or the host may be down.
Waiting for active host ip-10-151-2-120:27017 to recognize new settings... (ping : Fri Sep 13 2013 08:38:20 GMT+0000 (UTC))
Waited for active ping to change for host ip-10-151-2-120:27017, a migration may be in progress or the host may be down.
Waiting for the balancer lock...
assert.soon failed: function (){
        var lock = db.getSisterDB( "config" ).locks.findOne({ _id : lockId })
 
        if( state == false ) return ! lock || lock.state == 0
        if( state == true ) return lock && lock.state == 2
        if( state == undefined ) return (beginTS == undefined && lock) ||
                                        (beginTS != undefined && ( !lock || lock.ts + "" != beginTS + "" ) )
        else return lock && lock.state == state
    }, msg:Waited too long for lock balancer to unlock
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:110:60)
    at Function.sh.waitForDLock (src/mongo/shell/utils_sh.js:156:12)
    at Function.sh.waitForBalancerOff (src/mongo/shell/utils_sh.js:224:12)
    at Function.sh.waitForBalancer (src/mongo/shell/utils_sh.js:254:12)
    at Function.sh.stopBalancer (src/mongo/shell/utils_sh.js:126:8)
    at (shell):1:4
Balancer still may be active, you must manually verify this is not the case using the config.changelog collection.
Fri Sep 13 19:20:29.131 JavaScript execution failed: assert.soon failed: function (){
        var lock = db.getSisterDB( "config" ).locks.findOne({ _id : lockId })
 
        if( state == false ) return ! lock || lock.state == 0
        if( state == true ) return lock && lock.state == 2
        if( state == undefined ) return (beginTS == undefined && lock) ||
                                        (beginTS != undefined && ( !lock || lock.ts + "" != beginTS + "" ) )
        else return lock && lock.state == state
    }, msg:Waited too long for lock balancer to unlock at src/mongo/shell/utils_sh.js:L228
mongos>

Comment by Scott Hernandez (Inactive) [ 13/Sep/13 ]

If you can post a new mongodump of the config db we can see when the move happened. Are things now moving cleanly and starting to more evenly balance?

Restarting everything is a more dramatic way of resetting, but would effectively do the same thing as killing the moveChunk command.

Comment by Idan Kamara [ 13/Sep/13 ]

I tried something else instead: I shut down all mongod, mongoc and mongos processes and started them up again. Interestingly, 'sh.status' now reports that one chunk did in fact move:

abc.jakarta_companies
        shard key: { "_id" : "hashed" }
        chunks:
                shard0001       1
                shard0000       142

But now nothing seems to happen even though 'sh.isBalancerRunning/getBalancerState' both return true. All dbs report nothing for db.currentOp().

Comment by Idan Kamara [ 13/Sep/13 ]

Should I kill it?

Comment by Idan Kamara [ 13/Sep/13 ]

mongos> db.currentOp()
{
        "inprog" : [
                {
                        "opid" : "shard0000:88206245",
                        "active" : true,
                        "secs_running" : 27852,
                        "op" : "query",
                        "ns" : "abc.ankara_extractor",
                        "query" : {
                                "moveChunk" : "abc.ankara_extractor",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : NumberLong("-8652435924066214107")
                                },
                                "max" : {
                                        "_id" : NumberLong("-8460567550944533944")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.ankara_extractor-_id_-8652435924066214107",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client_s" : "10.151.90.78:33722",
                        "desc" : "conn770",
                        "threadId" : "0x7e3fbcbe0700",
                        "connectionId" : 770,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 136,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(77746),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(364155),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}

Comment by Scott Hernandez (Inactive) [ 13/Sep/13 ]

Can you post the output before you kill it?

db.killOp("shard0000:88206245"); // connected via mongos
//or
db.killOp("88206245"); // connected directly to the mongod of shard0000

Comment by Idan Kamara [ 13/Sep/13 ]

It is still active, the output is exactly the same (except for the timings). How do I kill it?

Comment by Scott Hernandez (Inactive) [ 13/Sep/13 ]

Is this moveChunk still active in currentOp()? If so, can you post the output and then you can kill it if you like. It will get restarted when the balancer runs against that collection again, so there is no real harm in stopping as the system is built to recover and retry, essentially.

If this happens again, and is reproduce-able, then we can turn up the log level and collect logs to see if more information can be found there.

Comment by Idan Kamara [ 13/Sep/13 ]

Also, I would suggest you look at the hardware (disk io, cpu) stats for those shards to see if there is an underlying bottleneck or indication of something wrong, or which could affect anything.

Looking at top, iostat, mongostat, mongotop suggests nothing is happening.

Comment by Idan Kamara [ 13/Sep/13 ]

log files

Comment by Idan Kamara [ 13/Sep/13 ]

It does seem active, but nothing has happened for at least a few hours.

db.currentOp():

shard0000> db.currentOp()
{
        "inprog" : [
                {
                        "opid" : 88206245,
                        "active" : true,
                        "secs_running" : 2223,
                        "op" : "query",
                        "ns" : "abc.ankara_extractor",
                        "query" : {
                                "moveChunk" : "abc.ankara_extractor",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : NumberLong("-8652435924066214107")
                                },
                                "max" : {
                                        "_id" : NumberLong("-8460567550944533944")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.ankara_extractor-_id_-8652435924066214107",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.
abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client" : "10.151.90.78:33722",
                        "desc" : "conn770",
                        "threadId" : "0x7e3fbcbe0700",
                        "connectionId" : 770,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 136,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(77746),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(364155),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}
 
 
shard0000> db.currentOp()
{
        "inprog" : [
                {
                        "opid" : 88206245,
                        "active" : true,
                        "secs_running" : 3026,
                        "op" : "query",
                        "ns" : "abc.ankara_extractor",
                        "query" : {
                                "moveChunk" : "abc.ankara_extractor",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : NumberLong("-8652435924066214107")
                                },
                                "max" : {
                                        "_id" : NumberLong("-8460567550944533944")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.ankara_extractor-_id_-8652435924066214107",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client" : "10.151.90.78:33722",
                        "desc" : "conn770",
                        "threadId" : "0x7e3fbcbe0700",
                        "connectionId" : 770,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 136,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(77746),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(364155),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}
 
mongos> db.currentOp()
{
        "inprog" : [
                {
                        "opid" : "shard0000:88206245",
                        "active" : true,
                        "secs_running" : 3575,
                        "op" : "query",
                        "ns" : "abc.ankara_extractor",
                        "query" : {
                                "moveChunk" : "abc.ankara_extractor",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : NumberLong("-8652435924066214107")
                                },
                                "max" : {
                                        "_id" : NumberLong("-8460567550944533944")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.ankara_extractor-_id_-8652435924066214107",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client_s" : "10.151.90.78:33722",
                        "desc" : "conn770",
                        "threadId" : "0x7e3fbcbe0700",
                        "connectionId" : 770,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 136,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(77746),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(364155),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}
 
mongos> db.currentOp()
{
        "inprog" : [
                {
                        "opid" : "shard0000:88206245",
                        "active" : true,
                        "secs_running" : 3836,
                        "op" : "query",
                        "ns" : "abc.ankara_extractor",
                        "query" : {
                                "moveChunk" : "abc.ankara_extractor",
                                "from" : "mongod1.abc.com:27018",
                                "to" : "mongod2.abc.com:27018",
                                "fromShard" : "shard0000",
                                "toShard" : "shard0001",
                                "min" : {
                                        "_id" : NumberLong("-8652435924066214107")
                                },
                                "max" : {
                                        "_id" : NumberLong("-8460567550944533944")
                                },
                                "maxChunkSizeBytes" : NumberLong(67108864),
                                "shardId" : "abc.ankara_extractor-_id_-8652435924066214107",
                                "configdb" : "mongoc1.abc.com:27019,mongoc2.abc.com:27019,mongoc3.abc.com:27019",
                                "secondaryThrottle" : true,
                                "waitForDelete" : false
                        },
                        "client_s" : "10.151.90.78:33722",
                        "desc" : "conn770",
                        "threadId" : "0x7e3fbcbe0700",
                        "connectionId" : 770,
                        "waitingForLock" : false,
                        "msg" : "step3 of 6",
                        "numYields" : 136,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(77746),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(364155),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}
 

Comment by Scott Hernandez (Inactive) [ 13/Sep/13 ]

From what you have posted it actually shows the moveChunk is active, as noted in the logs and the active moveChunk command in currentOp.

Please attach the following:

  • mongodump of your config db (take from a config server or via mongos)
  • logs for shard0000 and shard0001 (primary members if a replica set)
  • db.currentOp() from all shards and via one mongos (at least two samples taken more than 3 minutes apart)
  • your MMS group name, or url, if you are using MMS to monitor your sharded cluster.

Also, I would suggest you look at the hardware (disk io, cpu) stats for those shards to see if there is an underlying bottleneck or indication of something wrong, or which could affect anything.

Generated at Thu Feb 08 03:23:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.