Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-234-M2
- PM-234-T-lifecycle

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2021-03-22, Sharding 2021-04-05, Sharding EMEA 2021-05-03, Sharding EMEA 2021-05-17
Linked BF Score:
141
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Background

After a coordinator transitions to kRenaming, there exists a potential gap in-between these two events:

The recipient shard updates its routing info, and
The recipient shard renames the collection locally.

In this gap, it's possible that a router can attempt to read from this collection when the collection doesn't actually exist at the storage level. This might culminate in a NamespaceNotFound error, which isn't considered retryable.

Solution

In order to prevent this, on a given recipient shard, we will need to take the CSR's critical section from before the point in which the refresh completes, up until the rename itself has been completed.

To do this, create a resharding-specific RAII type that can be fed a new opCtx for entering/exiting the critical section. As part of the destruction of this RAII type, it's important to leave the critical section, so that if the resharding operation errors out, the shard isn't permanently stuck in the critical section.

depends on

SERVER-53258 [Resharding] Reject writes in opObserver when disallowWritesForResharding is true

Closed

is depended on by

SERVER-56659 Use local write concern when acquiring and releasing resumable critical section in resharding recipient

Closed

related to

SERVER-56612 Use the resharding-specific refresh function when recovering a resharding operation in the drain mode

Closed

SERVER-56785 Critical section is wrongly reacquired after completing a resharding operation on the donor shard

Closed

SERVER-109322 featureFlagReshardingSkipCloningAndApplyingIfApplicable makes resharding critcal section get acquired on a non-donor db primary shard before critical section is engaged by coordinator

Needs Scheduling

Assignee:: Sergi Mateo Bellido
Reporter:: Blake Oler
Participants:: Blake Oler, Githook User, Sergi Mateo Bellido
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jan 08 2021 02:57:10 PM UTC
Updated:: Aug 14 2025 10:02:53 PM UTC
Resolved:: May 05 2021 05:03:22 AM UTC
Confidence Status Last Update:: 10/Mar/21 12:27 PM

Details

Description

Background

Solution

Attachments

Issue Links

Forms

Activity

People

Dates