Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The _flushRoutingTableCacheUpdatesWithWriteConcern ({w:majority, wtimeout: 0}) command fails to block indefinitely (which is expected if the wtimeout: 0 as per the docs) on a shard primary in case the shard is undergoing network issues and is in a state where the shard's replica set has no primary.

An error status of WriteConcernFailed error is returned which is not retriable (by design). If a retriable (such as HostNotFound or HostUnreachable) error had been returned, the _flushRoutingTableCacheUpdatesWithWriteConcern would have been retried until it was successful (once the shard's replica set was healthy) and the resharding operation would not have failed.

(Note: this was investigated on v6.0)

is related to

SERVER-104317 Update WithAutomaticRetry to retry on WCEs

In Code Review

SERVER-102452 Make ReshardingDonorService retry on WriteConcernFailure when finishing

Closed

Assignee:: Unassigned
Reporter:: Nandini Bhartiya
Participants:: Nandini Bhartiya
Votes:: 0 Vote for this issue
Watchers:: 15 Start watching this issue

Created:: Jul 26 2024 07:53:33 PM UTC
Updated:: Apr 24 2025 08:30:25 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates