Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61633

Resharding's RecipientStateMachine doesn't join thread pool for ReshardingOplogFetcher, leading to server crash at shutdown

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Sharding 2021-11-29
    • 145
    • 1

    Description

      resharding::cancelWhenAnyErrorThenQuiesce() uses whenAllSucceed() from future_util.h in combination with whenAll() to wait on all of the data replication components exiting. The whenAllSucceed().onError() pattern is unreliable for this because the onError() lambda won't run when executor (which is the scoped executor here) has already been shut down. The kExecutorShutdownStatus error is propagated back to RecipientStateMachine through _dataReplicationQuiesced and consumed by the onCompletion() which is running on the cleanup executor.

      Since the ReshardingOplogFetcher runs on the ReshardingDataReplication::_oplogFetcherExecutor and the whenAll() was skipped, a task from it may still be running at shutdown after the RecipientStateMachine and thus the ReshardingDataReplication has been destroyed. A solution here would be to have RecipientStateMachine::_runMandatoryCleanup() join the ReshardingDataReplication::_oplogFetcherExecutor.

      ExecutorFuture<void> cancelWhenAnyErrorThenQuiesce(
          const std::vector<SharedSemiFuture<void>>& futures,
          ExecutorPtr executor,
          CancellationSource cancelSource) {
          return whenAllSucceedOn(futures, executor)
              .onError([futures, executor, cancelSource](Status originalError) mutable {
                  cancelSource.cancel();
       
                  return whenAll(thenRunAllOn(futures, executor))
                      .ignoreValue()
                      .thenRunOn(executor)
                      .onCompletion([originalError](auto) { return originalError; });
              });
      }
      

      Attachments

        Issue Links

          Activity

            People

              max.hirschhorn@mongodb.com Max Hirschhorn
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: