Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.2.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- resharding-success-rate-improvements

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Sprint:
ClusterScalability Jun9-Jun23
Confidence Status:
None
Work Order:
3
Size Category:
TBD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Estimated Weeks:
0

The steps in the "creating-collection"state and "building-index" state, namely _createTemporaryReshardingCollectionThenTransitionToCloning and _buildIndexThenTransitionToApplying, do not their own retry logic. So they rely on the top-level retry logic in _runUntilStrictConsistencyOrErrored which uses the primary_only_service_helpers::kDefaultRetryabilityPredicate which does not include LockTimeout error. So when a LockTimeout error occurs, e.g. while creating the temporary collection here and here and while creating the indexes here, the error would cause the entire resharding operation to fail.

Currently, LockTimeout is considered as a retryable error by the ShardingDDLCoordinator (ReshardColllectionCoordinator on the primary shard) because it is an Interruption error so after the resharding operation aborts, the ShardingDDLCoordinator would retry the _configsvrReshardCollection command which would initiate a new resharding operation. However, it is still very not user-friendly for resharding to need to start over just because of a LockTimeout error.

related to

SERVER-106143 Audit lock acquisitions of internal resharding collections to verify if lockTimeout errors can occur

Backlog

Assignee:: Kruti Shah
Reporter:: Cheahuychou Mao
Participants:: Cheahuychou Mao, Githook User, Kruti Shah
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: May 29 2025 04:31:10 PM UTC
Updated:: Jun 11 2025 06:05:48 PM UTC
Resolved:: Jun 11 2025 06:05:48 PM UTC
Confidence Status Last Update:: 02/Jun/25 7:37 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates