-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 6.0 Required, 7.0 Required, 8.0 Required
-
Component/s: None
-
None
-
ALL
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
By default, the strategy of resharding the initial phase is implemented in below function.
InitialSplitPolicy::ShardCollectionConfig SamplingBasedSplitPolicy::createFirstChunks(OperationContext* opCtx, const ShardKeyPattern& shardKey, const SplitPolicyParams& params)
The selectShardAndAppendChunk function determines the new data distribution for the resharded collection.
However, the method of determining data distribution by the number of chunks is outdated.
auto selectShardAndAppendChunk = [&](const BSONObj& chunkMin, const BSONObj& chunkMax) { auto bestShard = selectBestShard(chunkDistribution, zoneInfo, zoneToShardMap, {chunkMin, chunkMax}); appendChunk(params, chunkMin, chunkMax, &version, bestShard, &chunks); chunkDistribution[bestShard]++; lastChunkMax = chunkMax; };
Balancer decides how to migrate data has become based on different shard's collection size from 6.0.3.
I think resharding policy should be consistent with balancer.
Or else, we can't say reshard collection operation is much faster than alternative range migration procedure.
Currently, the range migration is still necessary after resharding operation for data balanced. The performance of this two is not comparable because the data distribution after the execution of this two is different. I think what we should consider more is the speed from start to complete data balanced.