[SERVER-80930] reshardCollection command can return ReshardCollectionAborted instead of actual failure status code Created: 09/Sep/23 Updated: 15/Sep/23 Resolved: 15/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.0, 6.0.0, 7.0.0, 7.1.0-rc2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Abdul Qadeer | Assignee: | [DO NOT USE] Backlog - Sharding NYC |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 5 | ||||||||||||
| Description |
|
If resharding fails due to any error that is not user initiated (and hence user won't expect ReshardCollectionAborted), followed by config server step down + step up, we recover the abort decision at the newly elected config server by checking the state document and signaling the context holder to abort here. When doing this we overwrite the status as ReshardCollectionAborted here incorrectly thinking it is user-initiated. Note that the original status at the previous config server primary's ReshardingCoordinatorService is present in memory in this onError handler in code. When checking if the context holder is aborted, we should additionally check if it was user-initiated and return the right status code. |