Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 9.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
ClusterScalability 13Apr-27Apr, ClusterScalability 27Apr-11May
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a resharding operation is aborted by abortReshardCollection or setFCV downgrade command while the coordinator is transitioning from kInitializing to kPreparingToDonate, it can leave recipient shards stuck with orphaned state machines in config.localReshardingOperations.recipient. Recipients remain in awaiting-fetch-timestamp indefinitely because they are never notified of the abort.

When the resharding coordinator transitions to kPreparingToDonate, the disk write commits first, making participants aware of the resharding operation. The in-memory _coordinatorDoc update runs afterward using the same interruptible OperationContext. If an abort cancels that opCtx in the window between the disk write and the in-memory update, _coordinatorDoc state in memory still reads kInitializing. The abort handler dispatches on _coordinatorDoc.getState() and seeing state < kPreparingToDonate, it takes the coordinator-only abort path which skips notifying participants, leaving recipients stuck with orphaned state machines.

This is a different manifestation from the same fundamental problem as in ~~SERVER-92857~~.

is related to

SERVER-92857 Resharding Coordinator's abort hangs if it encounters an unrecoverable error while establishing participants

Closed

Assignee:: Abdul Qadeer
Reporter:: Abdul Qadeer
Participants:: Abdul Qadeer, Githook User, TPM Jira Automations Bot
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Apr 21 2026 12:49:32 AM UTC
Updated:: May 04 2026 12:45:34 PM UTC
Resolved:: Apr 28 2026 02:38:54 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty