Resharding does not need to sample documents if the key is hashed

XMLWordPrintableJSON

    • Cluster Scalability
    • Fully Compatible
    • v8.1, v8.0, v7.0
    • ClusterScalability Apr14-Apr28
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Resharding uses SamplingBasedSplitPolicy, which is a descendant of the regular InitialSplitSpolicy base class.

      The function calculateHashedSplitPoints defined on the parent is only used in other child classes
      (SplitPointsBasedSplitPolicy::SplitPointsBasedSplitPolicy and AbstractTagsBasedSplitPolicy). The SamplingBasedSplitPolicy does not rely on this method based on the code inspection.

      If the shard key consists of only a hashed field we do not need to sample and can split the space deterministically among the recipients. This allows us to mitigate known issues with the $sample implementation and allow the final distribution of chunks to mirror the distribution of the customer's data without the downsides of sampling.

              Assignee:
              Kruti Shah
              Reporter:
              Lamont Nelson
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: