[SERVER-2985] Rebalancing too slow and moveChunk is blocked by balancer lock Created: 22/Apr/11 Updated: 10/May/12 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin, Sharding |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Critical - P2 |
| Reporter: | John Schulz | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | concurrency, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS Centos 5.4. Host HW dual socket Nehailm 4 cores, 36GB memory 24 1TB disks in Raid10 configuration. New Shard has 64GB of memory with 12 300 GB disks in Raid 10 configuration. |
||
| Attachments: |
|
| Participants: |
| Description |
|
We have added a new shard to a 4 shard cluster making it 5 shards. The cluster is under a very light workload. Watching the load balancer it would appear that its going to take 2-3 days to complete rebalancing the shards. > db.printShardingStatus(); shards: MigOidDB.MigOidCol chunks: We have tried using moveChunk to speed the process up but the load balancer has a "Metadata Lock" on the collection and will not allow us to do a manual moveChunk. > db.adminCommand({moveChunk : "MigOidDB.MigOidCol", find : {_id : "buggzeeann_30324171"}, to : "repset_e"}); " |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ] |
|
Lots of improvements for 1.8.3 if you haven't already tried that. |
| Comment by Eliot Horowitz (Inactive) [ 29/Apr/11 ] |
|
Sorry didn't have a chance to look at this in the interim. |
| Comment by John Schulz [ 23/Apr/11 ] |
|
Attached the changelog and a verbose printShardingStatus. The acutal server and Mongos logs are very verbose and multi gigs in size. If you can give me some specific ideas as to which logs and what data from those logs you are interested in I will do what I can to collect that data. 1.5 days after shard repset_e and just under 1 day after the rest of the shards were added we have rebalanced 382 out of the 4100+ that we need to rebalance. For several hours yesterday we completely shut down all inserts to the cluster and the rebalancing did not go any faster or slower. |
| Comment by Eliot Horowitz (Inactive) [ 23/Apr/11 ] |
|
You can only move 1 chunk at a time, so the lock is normal. Are you sure its "slow" or is bound by disk, etc... Can you send log files. |