Currently, on the first recipientSyncData command (without returnAfterReachingDonorTimestamp), we wait on the dataConsistent future. On interrupts (due to stepdown, etc.) of the future chain, we override the error status with the interrupt status so that the donor is able to retry the recipientSyncData command.
However, the second recipientSyncData command (with returnAfterReachingDonorTimestamp) simply waits for an optime to be majority committed. We should do something similar where we override the error status to an interrupt when appropriate so that the donor is able to retry the recipientSyncData command without aborting the migration.
A second bug was found, which shows up after this issue is resolved. Once the donor retries, the second recipientSyncData command is sent to the new primary of the recipient RST. We may be in a state where the tenant oplog applier exists, but hasn't been started yet. This bug manifests as an invariant failure here after being called here.
The fix is to wait on the _dataConsistentPromise instead of the _dataSyncStartedPromise.