Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- resharding-improvements
- resharding-performance-improvements

Assigned Teams:

Cluster Scalability
Story Points:
2
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Based on HELP-84458, it appears that the resharding cloner can easily produce more load than replication can handle in certain hardware configurations. Today, the resharding cloner does writes locally and does not wait for majority before proceeding to the next write. After cloning all documents, the first time the recipient will wait for replication before proceeding is prior to building indexes (after ~~SERVER-103566~~).

This is good for overall throughput, but maximizes the strain resharding puts on replication and the rest of the system. We can already throttle resharding to some extent using parameters like reshardingCollectionClonerBatchSizeCount and reshardingCollectionClonerWriteThreadCount, but these are not directly aware of the current level of replication lag. Until SPM-2935 and SPM-4263 can address this in a more complete way, we may want to consider adding an option to force the resharding cloner to await replication periodically as a stop gap solution.

is related to

SERVER-100264 Resharding Natural Order Pipeline Does Not Respect reshardingCollectionClonerBatchSizeInBytes

Closed

SERVER-103566 Make ReshardingRecipientService wait for replication lag across all nodes to be some threshold before building indexes

Closed

Assignee:: Unassigned
Reporter:: Brett Nawrocki
Participants:: Brett Nawrocki
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Nov 18 2025 08:08:45 PM UTC
Updated:: Nov 24 2025 03:48:00 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty