Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Assigned Teams:

Cluster Scalability
Operating System:
ALL
Sprint:
Cluster Scalability Priorities
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Resharding recipients will write down the resume token that can be used to resume the $natural scan on the source collection transactionally when writing the latest batch returned by the aggregation.

However, we obtain these batches by running getMores, which will only return when the batch size is reached.

In the case where we added a db primary shard as a recipient despite it owning no data (see ~~SERVER-54279~~ for why we do this), that recipient's getMore will never find any matching documents, never return, and therefore never mark progress. This means that if a failover occurs on that recipient, or one of the donors it is reading from is restarted, the aggregation will fail and must be restarted from the beginning. This can block resharding from making forward progress, especially if the source collection is very large and the collection scan takes a lot of time.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

add_limit_to_clone_pipeline.patch
6 kB
Dec 16 2025 07:08:46 PM UTC

duplicates

SERVER-111929 Make resharding only skip cloning when it doesn't own any chunks

Closed

is related to

SERVER-54279 Primary shard may end up with inconsistent collection catalog entry after resharding

Closed

SERVER-111883 Add ability for cursors to yield and return results early

Backlog

SERVER-91109 Optimize resharding when primary shard owns zero chunks for resharded collection

Closed

SERVER-111929 Make resharding only skip cloning when it doesn't own any chunks

Closed

related to

SERVER-112262 Add last timestamp of last agg/getMore sent by the resharding cloner to the state doc

Backlog

SERVER-112100 Expose some OpDebug info in curop

In Code Review

SERVER-119955 Resharding cloner can't store progress until it gets response from the query it sent

Needs Scheduling

(3 related to)

Assignee:: Unassigned
Reporter:: Brett Nawrocki
Participants:: Adi Zaimi, Brett Nawrocki
Votes:: 0 Vote for this issue
Watchers:: 21 Start watching this issue

Created:: Sep 18 2025 06:12:14 PM UTC
Updated:: Feb 19 2026 03:56:02 PM UTC
Resolved:: Feb 19 2026 03:55:50 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates