-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
To establish participants for resharding, the resharding coordinator sends the FlushRoutingTableCacheUpdates command to all donors and recipients. We eventually will attempt to insert a new state document if a resharding state machine does not exist, but we will swallow any NotPrimary errors.
In a recent BF we see that we are not able to insert the new state document due to the insert throwing InterruptedDueToReplStateChange, the FlushRoutingTableCacheUpdates command succeeds, and we never end up establishing the recipient shard as a participant. The result is that resharding hangs and the test times out.
This could be resolved by SPM-4126, but we should consider if it is worth fixing before we do that project.
- is related to
-
SERVER-92857 Resharding Coordinator's abort hangs if it encounters an unrecoverable error while establishing participants
-
- Backlog
-