[SERVER-80930] reshardCollection command can return ReshardCollectionAborted instead of actual failure status code Created: 09/Sep/23  Updated: 15/Sep/23  Resolved: 15/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.1.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Abdul Qadeer Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-73897 Resharding coordinator returns generi... Backlog
Assigned Teams:
Sharding NYC
Operating System: ALL
Participants:
Linked BF Score: 5

 Description   

If resharding fails due to any error that is not user initiated (and hence user won't expect ReshardCollectionAborted), followed by config server step down + step up, we recover the abort decision at the newly elected config server by checking the state document and signaling the context holder to abort here. When doing this we overwrite the status as ReshardCollectionAborted here incorrectly thinking it is user-initiated. Note that the original status at the previous config server primary's ReshardingCoordinatorService is present in memory in this onError handler in code.

When checking if the context holder is aborted, we should additionally check if it was user-initiated and return the right status code.


Generated at Thu Feb 08 06:44:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.