The ReshardingCoordinator relies on an exception being thrown and its .onError() handler being called to trigger its _shardsvrAbortReshardCollection flow. However, the ReshardingCoordinator fails to read the current state of the coordinator document to trigger the _shardsvrAbortReshardCollection flow when an earlier config server primary had already decided the resharding operation must abort. The lack of the .onError() handler being called leads the ReshardingCoordinator to attempt to commit the resharding operation anyway. This is severely problematic because the resulting collection will be incomplete and inconsistent (i.e. lost writes).
- Shards which had already received the _shardsvrAbortReshardCollection command from the earlier config server primary's resharding coordinator may have dropped the temporary resharding collection already. These shards effectively ignore the _shardsvrCommitReshardCollection command.
- Other shards which erroneously receive the _shardsvrCommitReshardCollection command will rename the temporary resharding collection over the source collection.
- Even shards which voted to abort to abort resharding operation (e.g. unrecoverable error during collection cloning or oplog application) can still rename the temporary resharding collection over the source collection.
- However shards which aren't in the "strict-consistency" state (recipient role) and aren't in the "blocking-writes" state (donor role) will reject the _shardsvrCommitReshardCollection command. The ReshardCollectionInProgress error response returned to the resharding coordinator will lead the config server primary to fassert(). While the fassert(5277000) is an indicator of this issue occurring, it isn't guaranteed that any shards will still be in a state to detect the resharding coordinator having delivered different decisions to different shards.
Thank you to chuck.zhang for discovering this issue while working on the automation restore procedure (which has the config server being started up in the aborting state for the resharding operation).