[SERVER-56158] ReshardingTxnCloner won't retry on Cancellation and NotPrimary errors from remote donor shards Created: 19/Apr/21  Updated: 29/Oct/23  Resolved: 20/Apr/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-config-txn-clone
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2021-05-03
Participants:
Story Points: 1

 Description   

ReshardingTxnCloner will return true from its until() lambda when the cancellation token isn't canceled. This means the remote donor shard returning a Cancellation or NotPrimary error causes the local recipient shard to halt cloning config.transactions records. The !cancelToken.isCanceled() condition should really be cancelToken.isCanceled() (see also ReshardingCollectionCloner for comparison).

if (status.isA<ErrorCategory::CancellationError>() ||
    status.isA<ErrorCategory::NotPrimaryError>()) {
    // Cancellation and NotPrimary errors indicate the primary-only service Instance
    // will be shut down or is shutting down now - provided the cancelToken is also
    // canceled. Otherwise, the errors may have originated from a remote response rather
    // than the shard itself.
    //
    // Don't retry when primary-only service Instance is shutting down.
    return !cancelToken.isCanceled();
}

This pattern with AsyncTry is fairly common throughout the resharding code. We should consider making a common utility to express this logic. The withAutomaticRetry() function added as part of SERVER-51606 switched to a pattern that avoids checking the cancellation token in the until() lambda because the AsyncTry always checks the cancellation token on its own anyway.



 Comments   
Comment by Githook User [ 20/Apr/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-56158 Add resharding::WithAutomaticRetry() util around AsyncTry.

Replaces direct usages of AsyncTry in ReshardingCollectionCloner,
ReshardingTxnCloner, and ReshardingOplogBatchApplier.
Branch: master
https://github.com/mongodb/mongo/commit/018063d3f7e77781aebd95de9f992aa21d5cb299

Generated at Thu Feb 08 05:38:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.