Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Incomplete
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 1.8.1
Component/s: Admin, Sharding
Labels:
- concurrency
- sharding
Environment:
OS Centos 5.4. Host HW dual socket Nehailm 4 cores, 36GB memory 24 1TB disks in Raid10 configuration. New Shard has 64GB of memory with 12 300 GB disks in Raid 10 configuration.

Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

We have added a new shard to a 4 shard cluster making it 5 shards. The cluster is under a very light workload. Watching the load balancer it would appear that its going to take 2-3 days to complete rebalancing the shards.

> db.printShardingStatus();
— Sharding Status —
sharding version:

{ "_id" : 1, "version" : 3 }

shards:
{
"_id" : "repset_a",
"host" : "repset_a/lmdb-m03.mail.aol.com:7312,lmdb-d02.mail.aol.com:7312,lmdb-d01.mail.aol.com:7312"
}
{
"_id" : "repset_b",
"host" : "repset_b/lmdb-d05.mail.aol.com:7312,lmdb-m06.mail.aol.com:7312,lmdb-d04.mail.aol.com:7312"
}
{
"_id" : "repset_c",
"host" : "repset_c/lmdb-d03.mail.aol.com:7312,lmdb-m02.mail.aol.com:7312,lmdb-m01.mail.aol.com:7312"
}
{
"_id" : "repset_d",
"host" : "repset_d/lmdb-d06.mail.aol.com:7312,lmdb-m05.mail.aol.com:7312,lmdb-m04.mail.aol.com:7312"
}
{
"_id" : "repset_e",
"host" : "repset_e/lmdb-d08.mail.aol.com:7312,lmdb-m09.mail.aol.com:7312,lmdb-d07.mail.aol.com:7312"
}
databases:

{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "MigOidDB", "partitioned" : true, "primary" : "repset_a" }

MigOidDB.MigOidCol chunks:
repset_e 205
repset_c 1283
repset_a 1283
repset_d 1283
repset_b 1283
too many chunksn to print, use verbose if you want to force print

{ "_id" : "test", "partitioned" : false, "primary" : "repset_a" } { "_id" : "local", "partitioned" : false, "primary" : "repset_a" } { "_id" : "MigOidCol", "partitioned" : false, "primary" : "repset_a" }

We have tried using moveChunk to speed the process up but the load balancer has a "Metadata Lock" on the collection and will not allow us to do a manual moveChunk.

> db.adminCommand({moveChunk : "MigOidDB.MigOidCol", find : {_id : "buggzeeann_30324171"}, to : "repset_e"});
{
"cause" : {
"who" : {
"_id" : "MigOidDB.MigOidCol",
"process" : "lmdb-d02.mail.aol.com:1303133674:1828878975",
"state" : 1,
"ts" : ObjectId("4db174cf7a18929e98aa4f5b"),
"when" : ISODate("2011-04-22T12:30:07.322Z"),
"who" : "lmdb-d02.mail.aol.com:1303133674:1828878975:conn34711:1543737603",
"why" : "migrate-

{ _id: \"ballc21_28406862\" }

"
},
"errmsg" : "the collection's metadata lock is taken",
"ok" : 0
},
"ok" : 0,
"errmsg" : "move failed"
}

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

slow_rebalance_data_for_10Gen.tar
880 kB
Apr 23 2011 12:34:03 PM UTC

Assignee:: Unassigned
Reporter:: John Schulz
Participants:: Eliot Horowitz, John Schulz
Votes:: 0 Vote for this issue
Watchers:: 0 Start watching this issue

Created:: Apr 22 2011 12:41:46 PM UTC
Updated:: May 10 2012 02:15:07 PM UTC
Resolved:: Sep 02 2011 04:55:31 AM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates