Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.3.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- resharding-success-rate-improvements

Assigned Teams:

Cluster Scalability
Backwards Compatibility:
Fully Compatible
Sprint:
ClusterScalability Jul7-Jul20, ClusterScalability Jul21-Aug3, ClusterScalability Aug4-18
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In general, resharding uses wtimeout: 0 when passing in the write concern options. However, catalog cache refreshes sometimes force a no-op write, which uses a wtimeout of 60 seconds. If a resharding service gets a wc timeout, it will treat this error as fatal. This was seen in AF-1904, and and fixed in ~~SERVER-102452~~ for the specific case hit in that ticket. This ticket is for doing the follow up work to make all resharding services retry on WCEs in general - we should be able to do this by making the kRetryabilityPredicateIncludeWriteConcernTimeout added in ~~SERVER-102452~~ the default for resharding::withAutomaticRetry. As a part of doing this, we should audit the idempotency of anything retried by resharding::withAutomaticRetry - ~~SERVER-102452~~ found that we would reset at least some in-memory state (like metrics), when we really shouldn't be.

depends on

SERVER-105325 Audit and fix uses of resharding::WithAutomaticRetry in the resharding codebase

Closed

is related to

SERVER-108642 Resharding services need to handle failures after state transitions

Open

SERVER-107952 Fix resharding hang when FlushReshardingStateChangeCmd fails

Closed

SERVER-109367 Coalesce all retry mechanisms in CS owned code

Backlog

SERVER-102452 Make ReshardingDonorService retry on WriteConcernFailure when finishing

Closed

related to

SERVER-92936 Writes with a {w: majority, wtimeout:0} fail in case of network errors.

Backlog

SERVER-104258 Resharding Can Hang If Recipient Fails During ShardsvrReshardRecipientClone

Closed

SERVER-102452 Make ReshardingDonorService retry on WriteConcernFailure when finishing

Closed

(3 related to)

Assignee:: Ben Gawel (Inactive)
Reporter:: Janna Golden (Inactive)
Participants:: Ben Gawel, Githook User, Janna Golden
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Apr 24 2025 08:28:58 PM UTC
Updated:: Aug 18 2025 07:40:30 PM UTC
Resolved:: Aug 18 2025 07:30:29 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates