Priority: Major - P3
Affects Version/s: 3.0.8, 3.3.11
Fix Version/s: None
Steps To Reproduce:
Create two-shard cluster; Stop the balancer; Create an empty sharded collection and split it into 500,000 chunks. (100k chunks may be enough to make noticeable differences, but it has been at 500k that it becomes very easy to observe.) Start the balancer to and observe the time taken to move each chunk.
- Create two-shard cluster;
- Stop the balancer;
- Create an empty sharded collection and split it into 500,000 chunks.
(100k chunks may be enough to make noticeable differences, but it has been at 500k that it becomes very easy to observe.)
- Start the balancer to and observe the time taken to move each chunk.
Sprint:Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31
I've been testing the speed of chunk migrations in an all-on-one-server test cluster. Even when the chunks being migrated are empty (i.e. the chunk move takes only ~0.1 secs) the entire cycle run by the balancer takes a lot longer.
|version||balance round time|
|v3.3.11-30-gc96009e||~ 9 secs|
From someone's else case with v3.2 and different servers / network to my test I heard of a ~6 second cycle. Not sure if that was a replica set config db or the older SCCC-style one.
Can the balancer be changed so that the balance round will do multiple chunks of each collection so long as they finish quickly? E.g. balance round identifies candidate chunks for migrations, and keeps on doing chunk moves for them serially until a, say, 10 sec window completes.
At any rate if data has been completely deleted for a big fraction of chunk ranges before adding a new shard, it would be good if those chunks moves happened a lot more quickly.