Make sure that there can never be dangling _shardsvrRecipientCriticalSectionStarted threads when resharding gets aborted both implicitly and explicitly

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      _shardsvrRecipientCriticalSectionStarted is the command that coordinator runs against recipients to notify them that the critical section has started (SERVER-114004). To make sure that no recipients can get stuck waiting for this notification when there are failovers on the coordinator or the recipient itself, this command also waits for the recipient to have transition to "strict-consistency". It does that by waiting for the _inStrictConsistencyOrError in the ReshardingRecipientService.

      It turns out the promise doesn't get fulfilled in the case where the resharding operation is aborted because the promise doesn't get fulfilled when the abort token has been cancelled. SERVER-114005 makes sure the coordinator will stop waiting for _shardsvrRecipientCriticalSectionStarted responses as soon as the resharding operation has been aborted both explicitly by the user and implicitly due to critical section timeout. So the bug would not cause any hang. However, it would result in dangling _shardsvrRecipientCriticalSectionStarted thread on that recipient since the _inStrictConsistencyOrError would not get fulfilled 

            Assignee:
            Unassigned
            Reporter:
            Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: