-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Resharding operation can fail to commit due to reuse of old txn number. The bug was exposed after addition of OSI replay protection SPM-4126
Following are sequence of trigger conditions for this.
- Resharding is in the commit phase specifically this line after sending commit notification for change stream here.
- An addShard operation runs concurrently. During addShard we send _shardsvrDrainOngoingDDLOperations triggering killSessionsAbortUnpreparedTransactions.
This aborts the coordinator's writeDecisionPersistedState transaction with InterruptedDueToAddShard - The _commitAndFinishReshardOperation retries, resending change stream notifications and advancing session txnNumber.
- On successful retry of commit, the stale updatedCoordinatorDoc is installed in memory with old session/txnNumber here instead of updating it with higher txnNumber.
- The second retry of _generateCommitNotificationForChangeStreams (first one already succeeded) causes it to read old txnNumber from coordinator doc's session _getSession