-
Type: Question
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 1.8.1
-
Environment:OS Centos 5.4. Host HW dual socket Nehailm 4 cores, 36GB memory 24 1TB disks in Raid10 configuration. New Shard has 64GB of memory with 12 300 GB disks in Raid 10 configuration.
We have added a new shard to a 4 shard cluster making it 5 shards. The cluster is under a very light workload. Watching the load balancer it would appear that its going to take 2-3 days to complete rebalancing the shards.
> db.printShardingStatus();
— Sharding Status —
sharding version:
shards:
{
"_id" : "repset_a",
"host" : "repset_a/lmdb-m03.mail.aol.com:7312,lmdb-d02.mail.aol.com:7312,lmdb-d01.mail.aol.com:7312"
}
{
"_id" : "repset_b",
"host" : "repset_b/lmdb-d05.mail.aol.com:7312,lmdb-m06.mail.aol.com:7312,lmdb-d04.mail.aol.com:7312"
}
{
"_id" : "repset_c",
"host" : "repset_c/lmdb-d03.mail.aol.com:7312,lmdb-m02.mail.aol.com:7312,lmdb-m01.mail.aol.com:7312"
}
{
"_id" : "repset_d",
"host" : "repset_d/lmdb-d06.mail.aol.com:7312,lmdb-m05.mail.aol.com:7312,lmdb-m04.mail.aol.com:7312"
}
{
"_id" : "repset_e",
"host" : "repset_e/lmdb-d08.mail.aol.com:7312,lmdb-m09.mail.aol.com:7312,lmdb-d07.mail.aol.com:7312"
}
databases:
MigOidDB.MigOidCol chunks:
repset_e 205
repset_c 1283
repset_a 1283
repset_d 1283
repset_b 1283
too many chunksn to print, use verbose if you want to force print
We have tried using moveChunk to speed the process up but the load balancer has a "Metadata Lock" on the collection and will not allow us to do a manual moveChunk.
> db.adminCommand({moveChunk : "MigOidDB.MigOidCol", find : {_id : "buggzeeann_30324171"}, to : "repset_e"});
{
"cause" : {
"who" : {
"_id" : "MigOidDB.MigOidCol",
"process" : "lmdb-d02.mail.aol.com:1303133674:1828878975",
"state" : 1,
"ts" : ObjectId("4db174cf7a18929e98aa4f5b"),
"when" : ISODate("2011-04-22T12:30:07.322Z"),
"who" : "lmdb-d02.mail.aol.com:1303133674:1828878975:conn34711:1543737603",
"why" : "migrate-
"
},
"errmsg" : "the collection's metadata lock is taken",
"ok" : 0
},
"ok" : 0,
"errmsg" : "move failed"
}