Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55852

Shards first acquire LockManager locks before reacting to abortReshardCollection

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Story Points:
      2

      Description

      The abortReshardCollection command triggers a shard to refresh using the _flushReshardingStateChange command. The _flushReshardingStateChange command first acquires a database and collection lock to check whether the critical section is held and again acquires these locks as part of onShardVersionMismatch() if the critical section wasn't held. These lock acquisitions can block if the shard has enqueued a strong lock. However, writes being stalled by the strong lock may be the motivation for the user having run abortReshardCollection in the first place. The abortReshardCollection command waiting for a strong lock request to be granted + released means an end-user would need to additionally run killOp on operations from internal (system) threads to have the server make forward progress, which undermines the utility of the abortReshardCollection command.

      We should instead have an explicit {_shardsvrAbortReshardCollection: <reshardingUUID>} command that interacts with the DonorStateMachines and RecipientStateMachines directly. Note that the coordinator's decision is irreversible so 'pushing' out the decision as opposed to having the participant shards 'pulling' it via a shard version refresh is still safe in presence of delayed messages.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-sharding-nyc Backlog - Sharding NYC
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: