[SERVER-5881] One of the shards is taking more data than others Created: 21/May/12  Updated: 15/Aug/12  Resolved: 20/Jun/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Preetham Derangula Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Attachments: Text File dbstats.txt     Text File dbstats_05292012.txt     Text File printShardingStatus.txt     Text File printShardingStatus_05292012.txt    
Operating System: ALL
Participants:

 Description   

I have 6 shards taking applications audit data.One of the shards is taking more data(24GB), when compared with the others(8-9GB). I used all the default settings. Any idea how this can be avoided?
Thanks,Preetham
I have attached printShardingStatus() output.



 Comments   
Comment by Spencer Brody (Inactive) [ 20/Jun/12 ]

Going to resolve due to inactivity. If this is still causing you problems feel free to re-open

Comment by Spencer Brody (Inactive) [ 01/Jun/12 ]

Unfortunately it could take quite some time for the lower chunk size to make a difference, since the chunks will only be split to the new smaller sizes after a certain amount of writes have happened to that chunk. You could speed up the process by manually splitting the chunks on shard 2 using the split chunk command. If you split the chunks on shard2 into into many smaller pieces, the balancer will then offload them to the other shards. Information on how to split chunks is available here: http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunks

This large an imbalance, however, suggests that there may be something about your shard key or usage pattern that causes this to occur. How are the UIDs you use for the shard key generated? Is there any reason that the delete you do would be more likely to hit the other shards than shard2?

Comment by Preetham Derangula [ 29/May/12 ]

I have made the changes that you were suggested(chunkSize=30now) and gave some time to see if the shard size imbalances would correct them selves. But it didnt correct it self.. I am attaching printstats in the attachments.

Comment by Spencer Brody (Inactive) [ 29/May/12 ]

I'm going to go ahead and resolve this issue. If you have further questions, feel free to re-open or create a new ticket.

Comment by Spencer Brody (Inactive) [ 21/May/12 ]

The balancer in MongoDB only balances based on number of chunks, not data size, so discrepancies like this in the data size per shard can happen. It seems like even though the number of chunks is the same for each shard, the chunks on shard 2 have ~100MB of data in them on average, while the chunks on the other shards have closer to 35MB on average. This could happen if most of the documents being deleted are coming from chunks on the other shards, and not many of them are hitting shard2. Can you think of any reason this might be the case?

One thing you can do to help mitigate against this is lower the max chunk size. This will cause chunks to split more often into smaller pieces, making migrations happen more often, but allowing data to be balanced at a more granular level. Documentation on how to change the chunk size is available here: http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-ChunkSizeConsiderations

Comment by Preetham Derangula [ 21/May/12 ]

Never changed it. Its 200MB. A maintenance process runs every day to delete any data thats older than 8 days.

Comment by Spencer Brody (Inactive) [ 21/May/12 ]

Have you ever changed the chunksize for this cluster? What is the chunksize now (can find by querying the config.settings collection)?
How often do documents get deleted?

Comment by Preetham Derangula [ 21/May/12 ]

No, they are not in MMS.

Comment by Preetham Derangula [ 21/May/12 ]

shard2 is taking more data. I have attached db.stats() output.

Comment by Spencer Brody (Inactive) [ 21/May/12 ]

Which shard is the one taking more data than the others? Can you attach db.stats() from the primary of each shard?
Are these machines in MMS?

Generated at Thu Feb 08 03:10:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.