Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60858

_configsvrReshardCollection command which joins existing ReshardingCoordinator may miss being interrupted on stepdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.0.4, 5.1.0-rc2
    • Affects Version/s: 5.0.0, 5.1.0-rc0
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Sharding 2021-11-01
    • 1

      The primary shard for the database running the _shardsvrReshardCollection command will re-send the _configsvrReshardCollection to the config server primary following a retryable error. The new invocation of the _configsvrReshardCollection command will join the existing ReshardingCoordinator instance rather than constructing a new one. However, when this situation occurs, setAlwaysInterruptAtStepDownOrUp() won't have been called on the OperationContext for the _configsvrReshardCollection command. The coordinator document having been written future and the resharding operation completion future aren't guaranteed to become ready with an error on stepdown or shutdown. This leads the _configsvrReshardCollection command to continue running on the config server node after it has stepped down.

      We should call setAlwaysInterruptAtStepDownOrUp() before waiting on these futures so that if the config server primary steps down then the primary shard for the database running the _shardsvrReshardCollection command will re-send the _configsvrReshardCollection to the new config server primary.

      if (auto existingInstance =
              getExistingInstanceToJoin(opCtx, nss, request().getKey())) {
          // Join the existing resharding operation to prevent generating a new resharding
          // instance if the same command is issued consecutively due to client disconnect
          // etc.
          reshardCollectionJoinedExistingOperation.pauseWhileSet(opCtx);
          existingInstance.get()->getCoordinatorDocWrittenFuture().get(opCtx);
          return existingInstance;
      }
      

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: