[SERVER-72666] The presplit optimisation of shardCollection will always assign a chunk to the primary shard Created: 10/Jan/23  Updated: 26/Oct/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Paolo Polato Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea, shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Catalog and Routing
Participants:
Story Points: 2

 Description   

When shardCollection is invoked on an empty collection specifying a hashed key, the DDL will perform an optimisation: it will split its key space in a number of chunks (using the numInitialChunks parameter) and then distribute them across different shards; by this way, the collection will be added to the sharding catalog as already balanced.

The distribution is not random: instead, chunks will be assigned to shards following the alphabetical order of shard IDs (except for the primary shard, that is assured to always receive at least one chunk).

Such strategy may lead to a state of data imbalance at cluster level when:

  • there is a high number of empty collections being sharded with a hashed key
  • the value of numInitialChunks is lower than the number of shards in the cluster.

In order to mitigate such effect (in addition to what documented in SERVER-72650) the constraint of always having at least one chunk in the primary shard of the collection being sharded could be removed.


Generated at Thu Feb 08 06:22:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.