[SERVER-77360] Evaluate $sample distributive properties when partitioning shard key space Created: 22/May/23  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea, shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on WT-11532 Fix session reset RNG by using cursor... Closed
is depended on by SERVER-68050 Change resharding split policy to cre... Blocked
Assigned Teams:
Catalog and Routing
Participants:
Story Points: 2

 Description   

As a precondition for SERVER-68050, it would be good to perform some testing of the sampling behavior to see if/how much more skewed the data distribution would end up between shards if we were to initially create only 1 chunk per shard. Since sampling is not exactly splitting the shard key space in even parts, the theory is that having more chunks better "spreads the risk" because distributing several heterogeneously sized chunks we may end up having a more balanced distribution.

Some experiments with a lot of chunks have shown >10GB of difference between shards with a global collection size of 500GB.


Generated at Thu Feb 08 06:35:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.