-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
ALL
-
v8.3
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
To establish participants for resharding, the resharding coordinator sends the FlushRoutingTableCacheUpdates command to all donors and recipients. We eventually will attempt to insert a new state document if a resharding state machine does not exist, but we will swallow any NotPrimary errors.
In a recent BF we see that we are not able to insert the new state document due to the insert throwing InterruptedDueToReplStateChange, the FlushRoutingTableCacheUpdates command succeeds, and we never end up establishing the recipient shard as a participant. The result is that resharding hangs and the test times out.
This could be resolved by SPM-4126, but we should consider if it is worth fixing before we do that project.
- depends on
-
SERVER-129377 ShardsvrReshard[Donor|Recipient]Initialize can return success without majority-committing the state document
-
- In Code Review
-
-
SERVER-129060 Ensure that resharding donor majority commits its state before acknowledging coordinator commands
-
- Closed
-
- is related to
-
SERVER-92857 Resharding Coordinator's abort hangs if it encounters an unrecoverable error while establishing participants
-
- Closed
-