here is a proposal for a new option "splitPoints" for the "shardCollection" command.
This new option is primarily required for core db working but can also be convenient in user cases.
We sometimes tell people to presplit and predistribute chunks by manual js scripting.
Syntax:
{ shardCollection: "ns", key:
, splitPoints: [ "a", "b", "c" ] }
Use case:
- for aggregation (MR / aggreg) there is a need to create a sharded output with known split points
- for users, to avoid migrations when importing a lot data
Implementation:
implementation is trivial since this command already deals with splitting the initial chunks, which is a more complex operation.
Difference is that here, the split points are provided, and chunks are assigned on all shards.
Alternative:
issue split and move chunk command for each chunk.
This requires distributed lock on the ns for each command and can create major delays.
Questions:
- when picking shards to distribute chunks, should it make sure there is at least 1 chunk on primary shard?
- internal use only or user facing