Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
Fully Compatible
-
ALL
-
Sharding 2021-05-03
-
1
Description
ReshardingTxnCloner will return true from its until() lambda when the cancellation token isn't canceled. This means the remote donor shard returning a Cancellation or NotPrimary error causes the local recipient shard to halt cloning config.transactions records. The !cancelToken.isCanceled() condition should really be cancelToken.isCanceled() (see also ReshardingCollectionCloner for comparison).
if (status.isA<ErrorCategory::CancellationError>() || |
status.isA<ErrorCategory::NotPrimaryError>()) {
|
// Cancellation and NotPrimary errors indicate the primary-only service Instance |
// will be shut down or is shutting down now - provided the cancelToken is also |
// canceled. Otherwise, the errors may have originated from a remote response rather |
// than the shard itself. |
// |
// Don't retry when primary-only service Instance is shutting down. |
return !cancelToken.isCanceled(); |
}
|
This pattern with AsyncTry is fairly common throughout the resharding code. We should consider making a common utility to express this logic. The withAutomaticRetry() function added as part of SERVER-51606 switched to a pattern that avoids checking the cancellation token in the until() lambda because the AsyncTry always checks the cancellation token on its own anyway.