Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-234-M3
- PM-234-T-lifecycle

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2021-03-08, Sharding 2021-03-22
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

It's possible that:

The last shard sends to the coordinator a write indicating that it has finished the resharding operation.
The write gets lost due to a network error.
The shard sends the write a second time, which the coordinator applies, finishing the resharding operation.
The first write from the shard finally gets sent, triggering the invariant that the resharding instance for that UUID still exists.

max.hirschhorn proposed two solutions:

Have the donors and recipients use retryable writes so that the shard doesn't attempt the write again if it's already been applied, and
Have the donors and recipients use a precondition that will make the write a no-op if the write has already been applied.

related to

SERVER-50966 Modify PersistentTaskStore to support waiting for majority with the WaitForMajorityService

Closed

Assignee:: Yuhong Zhang
Reporter:: Blake Oler
Participants:: Blake Oler, Githook User, Yuhong Zhang
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Dec 23 2020 09:43:54 PM UTC
Updated:: Oct 29 2023 09:59:17 PM UTC
Resolved:: Mar 11 2021 04:01:40 AM UTC
Confidence Status Last Update:: 05/Mar/21 3:57 PM