[SERVER-2487] Remove empty chunks (consolidate to neighbor chunk) Created: 05/Feb/11 Updated: 17/Mar/23 Resolved: 17/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Done | Votes: | 39 |
| Labels: | balancer, chunks, merge | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Description |
|
Add scavenger which will find empty chunks and consolidate them into the neighbor chunk(s). Doing this could cause re-balancing since the number of chunks will change. Maybe this should be a manual command? |
| Comments |
| Comment by Josef Ahmad [ 07/Jun/19 ] | |||||||
|
To identify empty chunks try the following script. This script prints out the size of each chunk in the collection. Set ns to the target namespace.
I've tested this script against 3.6.7 but please test in your own environment as well. You may also find this community repository interesting. https://github.com/vmenajr/mongo/blob/master/sharding_utils.js it contains a script to consolidate chunks. Before using in production be sure to test. Please note that the repository is not developed, maintained, or officially supported by MongoDB. | |||||||
| Comment by amardeep singh [ 12/Apr/19 ] | |||||||
|
am running on mongoDB 3.6.7 i have run into the same problem. i deleted bulk data and that has caused me to have a lot of empty chunks. I need a way to identify and either delete or merge them. But first i need a way to identify? how to i identify the empty chunks, and then how do i either delete them or merge them with other bigger chunks with data? | |||||||
| Comment by Kevin Rice [ 05/Oct/17 ] | |||||||
|
It's surprising to me this hasn't been solved yet. | |||||||
| Comment by Kay Agahd [ 08/Jan/14 ] | |||||||
|
Removing empty chunks would be fine indeed. | |||||||
| Comment by Kevin J. Rice [ 03/Aug/13 ] | |||||||
|
I have an additional use case: I mongorestore'd part of a db into a pre-split database. I accidentally turned on dataflow, which did inserts and created keys it shouldn't have. Having to abandon the data but not wanting to rebuild everything, I did db.collectionName.remove() and re-did the mongorestore. HOWEVER, all the old (now empty) chunks are still there. There's no way to get rid of them. Granted, in my case, with my random distribution, I should fill these up again, so no worries. But, in the meantime I'm way unbalanced during mongorestore. As an aside, I can readily see that moving documents out into another mongo instance (archiving old data) will result in possibly empty chunks, which would be good to consolidate. | |||||||
| Comment by Grégoire Seux [ 30/Jan/13 ] | |||||||
|
Is there any issue that will arise if we do it manually by editing the config database? We have around 11000 chunks and 15% of them are empty which create unbalance. The operation will be done as following: | |||||||
| Comment by Luke Ehresman [ 08/Jun/11 ] | |||||||
|
Eliot, the shard key was chosen on purposes with that in mind. We have enough collections that the writes tend to even out across all our shards, if the primary chunks (the ones all the writes go to) are distributed across the shards. The benefit of having the shard key with the timestamp is that we didn't need to create another index just for the shard key. Since we query only based on time (i.e. the last X minutes of data), an index on anything else would be only used for the shard key seemed like a waste and was dragging down insert performance. Can you see any harm in just leaving these empty chunks? Seems to be not optimal as they will continue to grow over time. But other than changing our shard key (which has other implications mentioned above), I don't see a way around it. | |||||||
| Comment by Eliot Horowitz (Inactive) [ 08/Jun/11 ] | |||||||
|
@luke - you may want to consider changing your shard key to a non time-based key. time based isn't great as all writes will tend to hit the same shard anyway | |||||||
| Comment by Luke Ehresman [ 08/Jun/11 ] | |||||||
|
This is affecting us too. Our shard key is time based, and we prune old documents. So we end up with lots of old chunks that are empty. The problem is that the balancer is moving these empty chunks around trying to balance the shard, with little actual effect. | |||||||
| Comment by Keith Branton [ 22/Mar/11 ] | |||||||
|
I'd rather see it on by default, with the option to disable it if I need to do any pre-splitting. | |||||||
| Comment by Sergei Tulentsev [ 22/Mar/11 ] | |||||||
|
I guess, this should be off by default. Admin has to enable it explicitly. | |||||||
| Comment by Scott Hernandez (Inactive) [ 22/Mar/11 ] | |||||||
|
If ever this is implemented we should be careful with pre-splitting when you are prep'n for loading new data. | |||||||
| Comment by Sergei Tulentsev [ 13/Mar/11 ] | |||||||
|
Here's a use case: There used to be a range in the keyspace, that was actively used. Namely, I was inserting string representations of numerical IDs instead of actual integers. When I realized that, it was already spanned over 20% of chunks. I have reinserted the records (and deleted old ones). Now I have a bunch of empty chunks, which cause actual imbalance to the shard. So, I vote for the manual command to consolidate those. I am even fine if it blocks the cluster for a couple of dozen seconds. | |||||||
| Comment by Scott Hernandez (Inactive) [ 05/Feb/11 ] | |||||||
|
Egbert: That is a good reason for it to be a manual command that the admin runs. Eliot, it seems like the flow is 1) move chunk to the same shard as the neighbor 1) combine chunks in config (transactionally, as a move is now). That should take care of the doc inserted during issue. It just seems like finding the candidates for consolidation is easier for empties and the consolidation should be faster since it very likely to be fast for empty chunks. Maybe you are correct though and the consolidation is basically the same. The process of figuring out which to consolidate could be move complicated in the general case. | |||||||
| Comment by Egbert Groot [ 05/Feb/11 ] | |||||||
|
Also to considerate: simply deleting empty chunks can't be done imo, since it conflicts with the idea of 'Pre-splitting' chunks. | |||||||
| Comment by Eliot Horowitz (Inactive) [ 05/Feb/11 ] | |||||||
|
Its just as hard since we can't guarantee a document isn't inserted into that chunk at any point in the process. | |||||||
| Comment by Scott Hernandez (Inactive) [ 05/Feb/11 ] | |||||||
|
It seems like general chunk consolidation is yet another task; and a much harder one. Removing empty chunks, where the chunk contains no documents, is a specialized form of consolidation. | |||||||
| Comment by Eliot Horowitz (Inactive) [ 05/Feb/11 ] | |||||||
|
Chunk consolidation is incredibly hard in the general case, so while it would be nice, its very low priority. |