-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
_shardsvrRecipientCriticalSectionStarted is the command that coordinator runs against recipients to notify them that the critical section has started (SERVER-114004). To make sure that no recipients can get stuck waiting for this notification when there are failovers on the coordinator or the recipient itself, this command also waits for the recipient to have transition to "strict-consistency". It does that by waiting for the _inStrictConsistencyOrError in the ReshardingRecipientService.
It turns out the promise doesn't get fulfilled in the case where the resharding operation is aborted because the promise doesn't get fulfilled when the abort token has been cancelled. SERVER-114005 makes sure the coordinator will stop waiting for _shardsvrRecipientCriticalSectionStarted responses as soon as the resharding operation has been aborted both explicitly by the user and implicitly due to critical section timeout. So the bug would not cause any hang. However, it would result in dangling _shardsvrRecipientCriticalSectionStarted thread on that recipient since the _inStrictConsistencyOrError would not get fulfilled
- depends on
-
SERVER-108852 Resharding participants don't handle change streams monitor failures immediately
-
- Open
-
- is related to
-
SERVER-114005 Resharding critical section timeout should cancel remaining steps on coordinator
-
- Closed
-
-
SERVER-114004 Add command for resharding coordinator to notify recipients that critical section has started
-
- Closed
-