Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2487

Remove empty chunks (consolidate to neighbor chunk)

    Details

      Description

      Add scavenger which will find empty chunks and consolidate them into the neighbor chunk(s).

      Doing this could cause re-balancing since the number of chunks will change.

      Maybe this should be a manual command?

        Issue Links

          Activity

          Hide
          eliot Eliot Horowitz added a comment -

          @luke - you may want to consider changing your shard key to a non time-based key. time based isn't great as all writes will tend to hit the same shard anyway

          Show
          eliot Eliot Horowitz added a comment - @luke - you may want to consider changing your shard key to a non time-based key. time based isn't great as all writes will tend to hit the same shard anyway
          Hide
          lehresman Luke Ehresman added a comment -

          Eliot, the shard key was chosen on purposes with that in mind. We have enough collections that the writes tend to even out across all our shards, if the primary chunks (the ones all the writes go to) are distributed across the shards. The benefit of having the shard key with the timestamp is that we didn't need to create another index just for the shard key. Since we query only based on time (i.e. the last X minutes of data), an index on anything else would be only used for the shard key seemed like a waste and was dragging down insert performance.

          Can you see any harm in just leaving these empty chunks? Seems to be not optimal as they will continue to grow over time. But other than changing our shard key (which has other implications mentioned above), I don't see a way around it.

          Show
          lehresman Luke Ehresman added a comment - Eliot, the shard key was chosen on purposes with that in mind. We have enough collections that the writes tend to even out across all our shards, if the primary chunks (the ones all the writes go to) are distributed across the shards. The benefit of having the shard key with the timestamp is that we didn't need to create another index just for the shard key. Since we query only based on time (i.e. the last X minutes of data), an index on anything else would be only used for the shard key seemed like a waste and was dragging down insert performance. Can you see any harm in just leaving these empty chunks? Seems to be not optimal as they will continue to grow over time. But other than changing our shard key (which has other implications mentioned above), I don't see a way around it.
          Hide
          kamaradclimber Grégoire Seux added a comment -

          Is there any issue that will arise if we do it manually by editing the config database? We have around 11000 chunks and 15% of them are empty which create unbalance.

          The operation will be done as following:
          deactivate balancer
          move the empty chunk to the shard of one of its neighbor
          remove the empty chunk
          extend the neighbor to cover the shard key space

          Show
          kamaradclimber Grégoire Seux added a comment - Is there any issue that will arise if we do it manually by editing the config database? We have around 11000 chunks and 15% of them are empty which create unbalance. The operation will be done as following: deactivate balancer move the empty chunk to the shard of one of its neighbor remove the empty chunk extend the neighbor to cover the shard key space
          Hide
          justanyone Kevin J. Rice added a comment -

          I have an additional use case: I mongorestore'd part of a db into a pre-split database. I accidentally turned on dataflow, which did inserts and created keys it shouldn't have. Having to abandon the data but not wanting to rebuild everything, I did db.collectionName.remove() and re-did the mongorestore. HOWEVER, all the old (now empty) chunks are still there. There's no way to get rid of them.

          Granted, in my case, with my random distribution, I should fill these up again, so no worries. But, in the meantime I'm way unbalanced during mongorestore.

          As an aside, I can readily see that moving documents out into another mongo instance (archiving old data) will result in possibly empty chunks, which would be good to consolidate.

          Show
          justanyone Kevin J. Rice added a comment - I have an additional use case: I mongorestore'd part of a db into a pre-split database. I accidentally turned on dataflow, which did inserts and created keys it shouldn't have. Having to abandon the data but not wanting to rebuild everything, I did db.collectionName.remove() and re-did the mongorestore. HOWEVER, all the old (now empty) chunks are still there. There's no way to get rid of them. Granted, in my case, with my random distribution, I should fill these up again, so no worries. But, in the meantime I'm way unbalanced during mongorestore. As an aside, I can readily see that moving documents out into another mongo instance (archiving old data) will result in possibly empty chunks, which would be good to consolidate.
          Hide
          kaga agahd added a comment - - edited

          Removing empty chunks would be fine indeed.
          However, it would be much better to join two chunks when they contain too few data. That would be analogous to split a chunk when its become too big.

          Show
          kaga agahd added a comment - - edited Removing empty chunks would be fine indeed. However, it would be much better to join two chunks when they contain too few data. That would be analogous to split a chunk when its become too big.

            People

            • Votes:
              31 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                1 year, 14 weeks, 3 days ago
                Date of 1st Reply: