Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cs-impact-3
- cs-neweng

Assigned Teams:

Cluster Scalability
Sprint:
Cluster Scalability Priorities
Story Points:
2
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The ReshardingCoordinator treats PrimaryOnlyService's interrupt() call as a no-op. This method is called to interrupt the instances when stepping or shutting down, as well as when the instance is released.

The resharding coordinator exposes a number of futures publicly (e.g. the completionFuture) which are only fulfilled if run() is called. However, there is no guarantee that run() is ever called, for example, if a stepdown were to occur here.

In theory, a waiter on the coordiantor's completion future could hang or receive a broken promise error if it were to wait on this future and run() was never called.

The two existing callers of the completion future are:

ConfigsvrReshardCollection
- This properly synchronizes with the RSTL, so if run() was never called because of a stepdown, this command would be interrupted anyway.

ConfigsvrAbortReshardCollection
- This does not properly synchronize with the RSTL so it's feasible that this dodges the RSTL killOp thread and continues running as a secondary. If that race occurs, as well as run() never being called, it's likely that the command would hang until that node steps up again (and it gets either killed by the RSTL killOp thread properly this time, or fails with a broken promise error after the instance is cleaned up as part of PrimaryOnlyService's step up logic).

The above uses the completion future as an example, but similar issues could exist for all other futures the coordinator exposes. It is likely that all promises should be set with an error in interrupt, similarly to what is being done in the ShardingCoordinator.

related to

SERVER-127612 Introduce ReshardingCoordinatorPromises wrapper

Backlog

Assignee:: Unassigned
Reporter:: Brett Nawrocki
Participants:: Brett Nawrocki
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 06 2026 08:20:50 PM UTC
Updated:: May 26 2026 09:22:28 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty