[SERVER-53506] Deal with the possibility of writes coming into the coordinator after the resharding operation has finished Created: 23/Dec/20  Updated: 29/Oct/23  Resolved: 11/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Yuhong Zhang
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-50966 Modify PersistentTaskStore to support... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2021-03-08, Sharding 2021-03-22
Participants:
Story Points: 2

 Description   

It's possible that:

  1. The last shard sends to the coordinator a write indicating that it has finished the resharding operation.
  2. The write gets lost due to a network error.
  3. The shard sends the write a second time, which the coordinator applies, finishing the resharding operation.
  4. The first write from the shard finally gets sent, triggering the invariant that the resharding instance for that UUID still exists.

max.hirschhorn proposed two solutions:

  1. Have the donors and recipients use retryable writes so that the shard doesn't attempt the write again if it's already been applied, and
  2. Have the donors and recipients use a precondition that will make the write a no-op if the write has already been applied.


 Comments   
Comment by Githook User [ 10/Mar/21 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'danielzhangyh@gmail.com', 'username': 'YuhongZhang98'}

Message: SERVER-53506 Deal with the possibility of writes coming into the coordinator after the resharding operation has finished
Branch: master
https://github.com/mongodb/mongo/commit/1f77d6a06078707510aa7bdfa78449ecf347eb3b

Generated at Thu Feb 08 05:31:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.