Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56158

ReshardingTxnCloner won't retry on Cancellation and NotPrimary errors from remote donor shards

    • Fully Compatible
    • ALL
    • Sharding 2021-05-03
    • 1

      ReshardingTxnCloner will return true from its until() lambda when the cancellation token isn't canceled. This means the remote donor shard returning a Cancellation or NotPrimary error causes the local recipient shard to halt cloning config.transactions records. The !cancelToken.isCanceled() condition should really be cancelToken.isCanceled() (see also ReshardingCollectionCloner for comparison).

      if (status.isA<ErrorCategory::CancellationError>() ||
          status.isA<ErrorCategory::NotPrimaryError>()) {
          // Cancellation and NotPrimary errors indicate the primary-only service Instance
          // will be shut down or is shutting down now - provided the cancelToken is also
          // canceled. Otherwise, the errors may have originated from a remote response rather
          // than the shard itself.
          //
          // Don't retry when primary-only service Instance is shutting down.
          return !cancelToken.isCanceled();
      }
      

      This pattern with AsyncTry is fairly common throughout the resharding code. We should consider making a common utility to express this logic. The withAutomaticRetry() function added as part of SERVER-51606 switched to a pattern that avoids checking the cancellation token in the until() lambda because the AsyncTry always checks the cancellation token on its own anyway.

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: