Loading...

XML

Word

Printable

JSON

Resharding operation can fail to commit due to reuse of old txn number. The bug was exposed after addition of OSI replay protection SPM-4126

Following are sequence of trigger conditions for this.

Resharding is in the commit phase specifically this line after sending commit notification for change stream here.
An addShard operation runs concurrently. During addShard we send _shardsvrDrainOngoingDDLOperations triggering killSessionsAbortUnpreparedTransactions.
This aborts the coordinator's writeDecisionPersistedState transaction with InterruptedDueToAddShard
The _commitAndFinishReshardOperation retries, resending change stream notifications and advancing session txnNumber.
On successful retry of commit, the stale updatedCoordinatorDoc is installed in memory with old session/txnNumber here instead of updating it with higher txnNumber.
The second retry of _generateCommitNotificationForChangeStreams (first one already succeeded) causes it to read old txnNumber from coordinator doc's session _getSession