-
Type: Improvement
-
Resolution: Incomplete
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.6.12
-
Component/s: Sharding
-
Labels:None
We had an existing database cluster of 2 shards that backed a multi-tenant testing environment. That environment had the following:
dbs: 541 collections: 31560 chunks: 144372
That is, many of these databases contained over 50 collections, each representing a different supported feature, but each with very little data. Thus, they were subject to the default chunk count at the time they were created; 4 chunks, 2 on each of the shards.
As this test environment continues to grow, we found we needed to add a 3rd shard. We did, and that moved some chunks, but then it stopped. At this time, here is the chunk distribution.
{ "_id" : "shard-1", "count" : 67574 } { "_id" : "shard-2", "count" : 66715 } { "_id" : "shard-3", "count" : 10413 }
After confirming that the balancer is working properly, we think we've isolated this to a missing heuristic in the balancer.
The balancer will balance each collection, and we've observed that working properly for years. However, it does not take into account the total number of chunks per shard, and so in a situation such as this, it does not properly balance these smaller collections across the cluster.