[SERVER-73897] Resharding coordinator returns generic abort error after recovery from stepdown Created: 10/Feb/23 Updated: 12/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | cs-subteam1, sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Cluster Scalability
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 5 | ||||||||||||
| Description |
|
When resharding aborts, it stores the abort reason in the coordinator document. If it steps down and restarts again, it will abort the cancel token when it sees that the state is aborting. This in turn will cause it to get callback cancelled error later (I suspect from here) and the resharding coordinator will treat it like the user aborted resharding and return the generic ReshardingAborted error code instead of the original error code. |
| Comments |
| Comment by Randolph Tan [ 10/Feb/23 ] |
|
Attached diff for cpp test that the resharding coordinator returns the original error code on normal case without stepdown (this will pass) and a test case where it should also return the original error after a stepdown (this will fail). |