Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77360

Evaluate $sample distributive properties when partitioning shard key space

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Catalog and Routing
    • 2

      As a precondition for SERVER-68050, it would be good to perform some testing of the sampling behavior to see if/how much more skewed the data distribution would end up between shards if we were to initially create only 1 chunk per shard. Since sampling is not exactly splitting the shard key space in even parts, the theory is that having more chunks better "spreads the risk" because distributing several heterogeneously sized chunks we may end up having a more balanced distribution.

      Some experiments with a lot of chunks have shown >10GB of difference between shards with a global collection size of 500GB.

            Assignee:
            backlog-server-catalog-and-routing [DO NOT USE] Backlog - Catalog and Routing
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: