[SERVER-49578] Handle reported unrecoverable errors from donors/recipients in the coordinator Created: 16/Jul/20 Updated: 06/Dec/22 Resolved: 31/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Blake Oler | Assignee: | [DO NOT USE] Backlog - Sharding NYC |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-error-flow | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Story Points: | 3 | ||||||||||||||||||||
| Description |
|
The coordinator observer will see the write to config.reshardingOperations and see that the write changes a shard to error. The coordinator will then update itself to the error state. Then it will refresh all shards. After all shards have returned that they've updated to the error state, the coordinator should indefinitely attempt to remove all resharding metadata that currently exists. Also take care of all TODOS linked in |