Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55328

Call Pipeline::dispose() on cleanup executor in resharding data replication components

    • Fully Compatible
    • Sharding 2021-04-05
    • 2

      Task executors are allowed to refuse work and the .onCompletion() continuation won't run if the task executor has been shut down. This is especially problematic for the ReshardingCollectionCloner after the changes from SERVER-54959 because the noCursorTimeout cursor will be permanently leaked on stepdown. We should instead be using the RecipientStateMachine::getInstanceCleanupExecutor() to run the .onCompletion() continuation.

      ReshardingCollectionCloner::run() and ReshardingTxnCloner::run() should be changed to additionally accept the cleanup task executor and should return a SemiFuture<void> so the caller must explicitly do .thenRunOn(**executor) to chain any further continuations.

      .on(executor, cancelToken)
      .thenRunOn(cleanupExecutor)
      .onCompletion([chainCtx](Status status) {
          if (chainCtx->pipeline) {
              // Use a separate Client to make a better effort of calling dispose() even when the
              // CancelationToken has been canceled.
              auto serviceContext = cc().getServiceContext();
              auto clientStrand = ClientStrand::make(
                  serviceContext->makeClient("ReshardingCollectionClonerCleanup"));
              auto clientGuard = clientStrand->bind();
      
              auto opCtx = clientGuard->makeOperationContext();
              chainCtx->pipeline->dispose(opCtx.get());
              chainCtx->pipeline.reset();
          }
      
          return status;
      })
      .semi();
      

            Assignee:
            janna.golden@mongodb.com Janna Golden
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: