Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.2.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Atlas Streams
Backwards Compatibility:
Fully Compatible
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

For the ObjectId case, we tentatively have the following partitioning strategies:

1) bucketAuto (being implemented as part of https://jira.mongodb.org/browse/SERVER-102099)
2) Random sampling
3) Use the information in the ObjectIds (timestamp) to come up with the ranges

Using bucketAuto would be simplest. We need to investigate what the ETA is for the bucketAuto to finish and if it is "too large" then use one of the other two strategies. To evaluate this we can set up a M40 cluster with a collection of size 100G and also have some random ongoing writes into the cluster. Have a SP do an initialSync of this collection and time the bucketAuto phase (which we will be able to do from splunk logs). Repeat for collection sizes ranging from 100G to 1T and see how the time scales. If the time scales sublinearly or linearly (but is reasonable for 50T) , we can use the bucketAuto strategy.

For the random sampling approach, we will do the following:
Let M be the parallelism specified by the SP. To minimize the chances of having extremely imbalanced buckets, we will create a larger number of buckets where the exact number will depend on the collection size.

The overall strategy could look like this:
1) For collections of size < 10G, whole collection is one partition
2) For collections of size >= 10G but less than some upper threshold (selected after running above tests), we will use bucketAuto
3) For collections larger than the upper threshold of (2), we will use random sampling - sample 10M times and create 10M partitions.

depends on

SERVER-102099 Initial plumbing for InitialSync

Closed

Assignee:: Mayuresh Kulkarni
Reporter:: Mayuresh Kulkarni
Participants:: Mayuresh Kulkarni
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Apr 03 2025 03:41:28 PM UTC
Updated:: Jun 05 2025 07:11:05 PM UTC
Resolved:: Jun 05 2025 07:11:03 PM UTC
Confidence Status Last Update:: 08/Apr/25 3:01 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates