The _flushReshardingStateChange command doesn't attempt to join with any earlier running shard version refreshes. It was believed for this to be safe due to shard version refreshes not being possible while the critical section is held. However, it is possible for an earlier _flushReshardingStateChange command that had been interrupted by stepdown to have called CollectionShardingRuntime::setShardVersionRecoverRefreshFuture() and for the RecoverRefreshThread to not yet have finished running the shard version refresh where the RecoverRefreshThread would have called CollectionShardingRuntime::resetShardVersionRecoverRefreshFuture().
This leads the _flushReshardingStateChange command to hit this invariant in CollectionShardingRuntime::setShardVersionRecoverRefreshFuture().
Proposed solution: Rather than attempting to make the _flushReshardingStateChange command attempt to join with a shard version refresh triggered by any earlier instances of the command, we could instead introduce a new _shardsvrCommitReshardCollection command analogous to the _shardsvrAbortReshardCollection command introduced in
SERVER-56638. The _shardsvrCommitReshardCollection would
- Call _coordinatorHasDecisionPersisted.emplaceValue().
- Wait on DonorStateMachine::getCompletionFuture() and RecipientStateMachine::getCompletionFuture().
- Wait for the latest optime to become majority-committed.
With the proposed _shardsvrCommitReshardCollection command, DonorStateMachine and RecipientStateMachine would additionally need to be changed to call CollectionShardingRuntime::clearFilteringMetadata() prior to releasing the critical section. This is needed to guarantee that a stale mongos cannot get a response of "no documents" after the donor shard has dropped the original collection and would instead be told to refresh its shard version. DonorStateMachine and RecipientStateMachine should additionally call onShardVersionMismatch() after releasing the critical section to eagerly refresh their shard version and learn of the new collection epoch before the first operation for the namespace being resharded comes in.