[SERVER-5047] Better chunk selection when balancing Created: 23/Feb/12  Updated: 16/Jan/24

Status: Blocked
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Greg Studer Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 11
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File balance.pl     Text File balancer.patch    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-30653 Randomize Chunk Locations with chunkL... Closed
Related
is related to SERVER-9114 Add balancer strategy that balances b... Blocked
is related to SERVER-9120 Load based balancing Blocked
Assigned Teams:
Cluster Scalability
Participants:
Case:

 Description   

The balancer will always move the lowest chunk on the shard, unless the highest chunk matches the lowest chunk on another shard. Without any more information about data growth, this should probably be random or a smarter choice that minimizes fragmentation.



 Comments   
Comment by Kevin Pulo [ 26/Nov/18 ]

unless the highest chunk matches the lowest chunk on another shard

This hasn't been the case for quite some time (at least 2.6), so I updated the description.

Comment by Grégoire Seux [ 16/Oct/12 ]

a nice to have would be a plugable balancer strategy of course

Comment by Grégoire Seux [ 15/Jun/12 ]

Hello,

I have seen recent patches by Eliot on balancer going tag aware. Is there any progress on the thinking for improving balancer policy ?

Comment by Grégoire Seux [ 05/Jun/12 ]

I agree that being able to choose strategy could be nice. However I can't think of a case where the default behavior would be better than the proposed patch.
The current implementation has serrious drawbacks and random shard assignmeent would probably behave better.
As for user provided strategy, it is very nice on paper but very error prone

Comment by Derek Chen-Becker [ 05/Jun/12 ]

I agree with Neil. The default behavior is workable for many cases, but it would be really nice to be able to provide a custom selector for which chunk gets migrated from a given shard.

Comment by Neil Sanchala [ 05/Jun/12 ]

I don't think there exists a perfect chunk-assignment strategy that works for all use-cases. Could there be a set of strategies you can choose between?

Alternatively, you could allow users to specify a snippet of JS that makes the choice, given the list of source chunks and dest chunks. The default behavior and Pierre's suggestion could both be implemented pretty easily in JS. This piece of code doesn't have to be all that performant, different people will want different things, the inputs are clearly defined, and the output is easily validated. Sounds perfect for JS?

Pierre: nice code, btw.

Comment by Pierre Ynard [ 16/May/12 ]

Alright. If you think of a particular use case and suspect or find that it yields bad results, feel free to let me know and I'll look into it.

Comment by Eliot Horowitz (Inactive) [ 16/May/12 ]

There are certain classes where this is good and others where it will make things worse.
So we need to test something like this across lots of various use cases.

Comment by Pierre Ynard [ 16/May/12 ]

Could we have an update on this? We've been running this in production on our cross-datacenter cluster with dozens of servers, and so far haven't seen any problem.

Comment by Derek Chen-Becker [ 02/May/12 ]

We're seeing sub-optimal Map/Reduce performance due to this issue. We have 5 shards in our cluster but for various sets of data the chunks are quite unbalanced. In one case almost 2.5M docs reside on two shards each, while two more shards have 400k and the final shard has zero docs. If someone with more knowledge thinks Pierre's patch looks good we would definitely consider using it.

Comment by Pierre Ynard [ 30/Apr/12 ]

Here is my patch implementing this behavior in the balancer. It was tested and validated with real data. Please consider for inclusion.

Comment by Pierre Ynard [ 23/Apr/12 ]

Chunk balancer logic test script

Comment by Pierre Ynard [ 23/Apr/12 ]

Hello,

I implemented a new algorithm for the pickChunk() function of the balancer. Instead of choosing the first chunk or a random chunk, it finds the biggest gap between two chunks of the receiver shard (i.e. with the most chunks of the donor shard between them) and picks a chunk in the middle of it. This tries to ensure that data is evenly distributed among shards across the key range, and fixes the issue Grégoire mentioned. It should address Benjamin's issue too, as, after shards are balanced, chunks to move will tend to be selected from where data is hot and growing.

I'm attaching a simulation script that I used to test and compare this behavior to the current one. For example, with 3 shards starting with all data on shard 1 only (. is a chunk on shard 1, O a chunk on shard 2 and X a chunk of shard 3, and each line is a balancing round) :

..................................................
........................O.........................
........................OX........................
...........O............OX........................
...........O............OX...........X............
...........O............OX...........XO...........
...........OX...........OX...........XO...........
.....O.....OX...........OX...........XO...........
.....O.....OX.....X.....OX...........XO...........
.....O.....OX.....X.....OX.....O.....XO...........
.....O.....OX.....X.....OX.....O.....XO.....X.....
.....O.....OX....OX.....OX.....O.....XO.....X.....
....XO.....OX....OX.....OX.....O.....XO.....X.....
....XO.....OX....OX.....OX.....O.....XO....OX.....
....XO.....OX....OX.....OX....XO.....XO....OX.....
....XO..O..OX....OX.....OX....XO.....XO....OX.....
....XO..O..OX....OX..X..OX....XO.....XO....OX.....
....XO..O..OX....OX..X..OX....XO..O..XO....OX.....
....XO..O..OX....OX..X..OX....XO..O..XO....OX..X..
.O..XO..O..OX....OX..X..OX....XO..O..XO....OX..X..
.O..XO.XO..OX....OX..X..OX....XO..O..XO....OX..X..
.O..XO.XO..OX.O..OX..X..OX....XO..O..XO....OX..X..
.O..XO.XO..OX.O..OX..X..OX.X..XO..O..XO....OX..X..
.O..XO.XO..OX.O..OX.OX..OX.X..XO..O..XO....OX..X..
.O..XO.XO..OX.O..OX.OX..OX.X..XO.XO..XO....OX..X..
.O..XO.XO..OX.O..OX.OX..OX.X..XO.XO..XO.O..OX..X..
.OX.XO.XO..OX.O..OX.OX..OX.X..XO.XO..XO.O..OX..X..
.OX.XO.XO..OX.O..OX.OX..OX.X..XO.XO..XO.O..OX.OX..
.OX.XO.XO..OX.OX.OX.OX..OX.X..XO.XO..XO.O..OX.OX..
.OX.XO.XO..OX.OX.OX.OX..OX.XO.XO.XO..XO.O..OX.OX..
.OX.XO.XO..OX.OX.OX.OX..OX.XO.XO.XO..XO.OX.OX.OX..
.OX.XO.XOO.OX.OX.OX.OX..OX.XO.XO.XO..XO.OX.OX.OX..
.OX.XO.XOO.OX.OX.OX.OXX.OX.XO.XO.XO..XO.OX.OX.OX..

This is ugly :/ but run the script from a terminal to see other examples with shiny terminal colors!

There is a minor flaw in this algorithm when adding several shards at a time with badly (not) distributed data (shown in test #5), new shards get chunks from separate parts of the key range only; but I don't see this as a real issue.

I still need to test my patch with real mongod and mongos processes, but if you like this approach then I'll post it once it's ready.

Comment by Grégoire Seux [ 13/Apr/12 ]

Looking at the current code, we've found that the balancer is trying to minimize fragmentation of chunks (trying to move successive chunks to the same shards). Is there a reason for that objective ?

Comment by Grégoire Seux [ 13/Apr/12 ]

My guess would that successive chunks (according to the shard key order) should not be on the same shard. This increase performance in case of sequential access or rouglhy targeted access patterns.
For instance, we have a compound shard key (partnerId , uniqueId) and we know that a given partnerId could become a hotspot for a small period of time. If consecutive chunks (having the same partnerId) are on different shards it will be balance the load on all shards.

Random strategy work for most case, interlacing works better on mine.

The current strategy is like no sharding at all for my set up : 3 shards. The first half of chunks are well balanced between 2 shards, the second half is all on the third one (the primary shard of the database of course).

Comment by Benjamin Darfler [ 19/Mar/12 ]

We have architected our system to use a shard key conforming to Kristina's suggestions in her blog post. (http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/). However, she has confirmed that this shard key won't work well with the balancer logic in 2.0. I want to make sure that this is addressed in the 2.2 release.

The biggest pain point we have experienced with MongoDB has been balancing data. When cold chunks are moved by the balancer it requires significant i/o to read the data and eats away at RAM that could be better used to handle online queries. The benefit of Kristina's design is that chunks are split and moved when they are hot, addressing both of the issues with cold chunks. This is critical for online balancing without huge performance impacts. If we could simply add a new shard and hot data would start migrating over to it seamlessly we could avoid our seemingly never ending monthly crisis of adding shards and manually moving chunks to distribute load, or even worse doing a full dump and load of the data to balance everything out.

Please, help make this experience better.

Generated at Thu Feb 08 03:07:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.