Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-103318

Investigate different partitioning strategies and select one

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Atlas Streams
    • None
    • 0
    • None
    • None
    • None
    • None
    • None
    • None

      For the ObjectId case, we tentatively have the following partitioning strategies:

      1) bucketAuto (being implemented as part of https://jira.mongodb.org/browse/SERVER-102099)
      2) Random sampling
      3) Use the information in the ObjectIds (timestamp) to come up with the ranges

      Using bucketAuto would be simplest. We need to investigate what the ETA is for the bucketAuto to finish and if it is "too large" then use one of the other two strategies. To evaluate this we can set up a M40 cluster with a collection of size 100G and also have some random ongoing writes into the cluster. Have a SP do an initialSync of this collection and time the bucketAuto phase (which we will be able to do from splunk logs). Repeat for collection sizes ranging from 100G to 1T and see how the time scales. If the time scales sublinearly or linearly (but is reasonable for 50T) , we can use the bucketAuto strategy.

      For the random sampling approach, we will do the following:
      Let M be the parallelism specified by the SP. To minimize the chances of having extremely imbalanced buckets, we will create a larger number of buckets where the exact number will depend on the collection size.

      The overall strategy could look like this:
      1) For collections of size < 10G, whole collection is one partition
      2) For collections of size >= 10G but less than some upper threshold (selected after running above tests), we will use bucketAuto
      3) For collections larger than the upper threshold of (2), we will use random sampling - sample 10M times and create 10M partitions.

            Assignee:
            mayuresh.kulkarni@mongodb.com Mayuresh Kulkarni
            Reporter:
            mayuresh.kulkarni@mongodb.com Mayuresh Kulkarni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: