-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding 2020-07-27, Sharding 2020-08-10
Add a new subclass of InitialSplitPolicy that will pick split points and create chunks for the new shard key based on the existing data in the collection. This ticket is meant to only cover the case where zones are not defined (or an empty zones list is passed in).
There are two important methods to define in this class:
1. Constructor
- The constructor should take in numInitialChunks (numInitialChunks is decided as a part of configsvrReshardCollection) and should calculate the new split points (perhaps it can look similar to SplitPointsBasedSplitPolicy).
- In order to calculate the new split points for a shard key where none of the shard key fields are hashed, this new subclass should calculate the new split points using the following steps:
- Call ClusterAggregate::runAggregate with the pipeline defined just below. The oversampling ratio "s" in the pipeline should be equal to some constant, for now let's say s == 10. This is useful to get a more accurate sampling of the existing documents.
- We need `numInitialChunks - 1` split points, but the aggregation above will return more docs than we need split points ("s" times more). Exhaust the cursor returned from ClusterAggregate::runAggregate and save every "sth" document as a split point. The 0th doc should be the first split point.
- Pipeline to pass to runAggregate:
[ { $sample: { size: numInitialChunks * s (where s = oversampling ratio } }, { $project: { _id: 0, "sk0": {$ifNull: ["$shardKeyField0", null]}, "sk1": {$ifNull: ["$shardKeyField1", null]}, ... and so on for each shard key field } }, { $sort: { sk0: 1, sk1: 1 } }, ]
2. createFirstChunks
- createFirstChunks should create `numInitialChunks` Chunk objects using the split points calculated. appendChunk should be useful here. To pick which shard each chunk will belong to, round robin each chunk among all the shards that are allowed to own chunks for this collection (this should be all shards that owned chunks for original collection).
For the purposes of this ticket (and until SERVER-49214 is completed), if there is a hashed field in the shard key we can just return one chunk covering the entire shard key range and place it on the primary shard.
We should add unit testing for this in initial_split_policy_test.cpp.
- is depended on by
-
SERVER-49525 Sample documents to pick new split points for resharding when new shard key has a hashed field
- Closed