Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Operating System:
ALL
Sprint:
ClusterScalability Sep29-Oct13, ClusterScalability Oct13-Oct27, ClusterScalability Nov10-Nov24, ClusterScalability Nov24-Dec8, ClusterScalability Dec8-Dec22, ClusterScalability Dec22-Jan5, ClusterScalability Jan5-Jan19, ClusterScalability 19Jan-2Feb, ClusterScalability 2Feb-16Feb
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Resharding donors and recipients start change stream monitors but they do not immediately handle monitor failures. Instead, errors remain unchecked until participants later call, causing delayed error detection and potentially allowing operations to continue only to fail later.

Current Behavior:

When the participants start monitoring, the changeStreamsMonitor returns a semi future.
The returned future is stored but not immediately checked for errors.
If the monitor fails (e.g., due to network issues, cursor problems, etc.), the error sits in the fulfilled future.
Donor/Recipient continues with other operations, unaware of the monitor failure.
Error is only surfaced much later when awaitChangeStreamsMonitorCompleted() is called.

~~SERVER-104946~~ will be committing an unit test demonstrating this behavior on the donor. A possible solution is to add a background task that monitors the change streams monitor future and immediately handles errors.s.

depends on

SERVER-114114 Update resharding participants to use a cancellationTokenHolder

Closed

is depended on by

SERVER-114077 Make sure that there can never be dangling _shardsvrRecipientCriticalSectionStarted threads when resharding gets aborted both implicitly and explicitly

Backlog

is related to

SERVER-104946 Create unit tests to trigger an unrecoverable error at every phase of resharding

Closed

related to

SERVER-109032 ReshardingChangeStreamMonitor on completion needs to be handled correctly

Backlog

Assignee:: Kruti Shah
Reporter:: Kruti Shah
Participants:: Kruti Shah
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Aug 06 2025 04:54:48 AM UTC
Updated:: Feb 17 2026 05:55:49 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates