-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding 2021-03-22, Sharding 2021-04-05, Sharding EMEA 2021-05-03, Sharding EMEA 2021-05-17
-
141
-
2
Background
After a coordinator transitions to kRenaming, there exists a potential gap in-between these two events:
- The recipient shard updates its routing info, and
- The recipient shard renames the collection locally.
In this gap, it's possible that a router can attempt to read from this collection when the collection doesn't actually exist at the storage level. This might culminate in a NamespaceNotFound error, which isn't considered retryable.
Solution
In order to prevent this, on a given recipient shard, we will need to take the CSR's critical section from before the point in which the refresh completes, up until the rename itself has been completed.
To do this, create a resharding-specific RAII type that can be fed a new opCtx for entering/exiting the critical section. As part of the destruction of this RAII type, it's important to leave the critical section, so that if the resharding operation errors out, the shard isn't permanently stuck in the critical section.
- depends on
-
SERVER-53258 [Resharding] Reject writes in opObserver when disallowWritesForResharding is true
- Closed
- is depended on by
-
SERVER-56659 Use local write concern when acquiring and releasing resumable critical section in resharding recipient
- Closed
- related to
-
SERVER-56612 Use the resharding-specific refresh function when recovering a resharding operation in the drain mode
- Closed
-
SERVER-56785 Critical section is wrongly reacquired after completing a resharding operation on the donor shard
- Closed