Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Assigned Teams:

Cluster Scalability
Operating System:
ALL
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

To establish participants for resharding, the resharding coordinator sends the FlushRoutingTableCacheUpdates command to all donors and recipients. We eventually will attempt to insert a new state document if a resharding state machine does not exist, but we will swallow any NotPrimary errors.

In a recent BF we see that we are not able to insert the new state document due to the insert throwing InterruptedDueToReplStateChange, the FlushRoutingTableCacheUpdates command succeeds, and we never end up establishing the recipient shard as a participant. The result is that resharding hangs and the test times out.

This could be resolved by SPM-4126, but we should consider if it is worth fixing before we do that project.

is related to

SERVER-92857 Resharding Coordinator's abort hangs if it encounters an unrecoverable error while establishing participants

Backlog

Assignee:: Unassigned
Reporter:: Ben Gawel (Inactive)
Participants:: Ben Gawel
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Sep 16 2025 06:40:42 PM UTC
Updated:: Nov 05 2025 10:30:47 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates