[SERVER-49578] Handle reported unrecoverable errors from donors/recipients in the coordinator Created: 16/Jul/20  Updated: 06/Dec/22  Resolved: 31/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Duplicate Votes: 0
Labels: PM-234-M3, PM-234-T-error-flow
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-54000 Make errors propagate from the Reshar... Closed
is duplicated by SERVER-51800 Refresh both donor and recipient shar... Closed
Gantt End to End
has to be finished together with SERVER-53006 Complete TODO listed in SERVER-51800 Closed
Assigned Teams:
Sharding NYC
Participants:
Story Points: 3

 Description   

The coordinator observer will see the write to config.reshardingOperations and see that the write changes a shard to error. The coordinator will then update itself to the error state. Then it will refresh all shards. After all shards have returned that they've updated to the error state, the coordinator should indefinitely attempt to remove all resharding metadata that currently exists.

Also take care of all TODOS linked in SERVER-53006 in this ticket, then close both in one commit. These TODOS are currently tied to the duplicated ticket SERVER-51800.


Generated at Thu Feb 08 05:20:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.