[SERVER-77360] Evaluate $sample distributive properties when partitioning shard key space Created: 22/May/23 Updated: 26/Oct/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Pierlauro Sciarelli | Assignee: | Backlog - Catalog and Routing |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | oldshardingemea, shardingemea-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||||||
| Participants: | |||||||||||||
| Story Points: | 2 | ||||||||||||
| Description |
|
As a precondition for SERVER-68050, it would be good to perform some testing of the sampling behavior to see if/how much more skewed the data distribution would end up between shards if we were to initially create only 1 chunk per shard. Since sampling is not exactly splitting the shard key space in even parts, the theory is that having more chunks better "spreads the risk" because distributing several heterogeneously sized chunks we may end up having a more balanced distribution. Some experiments with a lot of chunks have shown >10GB of difference between shards with a global collection size of 500GB. |