-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
Fully Compatible
-
ALL
-
ClusterScalability Jul7-Jul20
-
200
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
SERVER-106707 added InterruptedDueToReshardingCriticalSection to this list of error codes that should have the "TransientTransactionError" label. This is not correct/safe for the following reasons:
- It makes all commands that fail with InterruptedDueToReshardingCriticalSection top-level error code have the "TransientTransactionError" label. This is not safe since:
- The "TransientTransactionError" label makes the driver and internal transaction API retry the entire transaction.
- The interrupt can occur after a transaction has started running commitTransaction, and in that case it is not safe for the driver or internal transaction API to retry the entire transaction. It is only safe for the driver or internal transaction API to retry the commitTransaction command itself. Per the driver spec, the driver would only retry commitTransaction instead of the entire transaction if the response has the "RetryableWriteError" label.
- It makes commitTransaction or abortTransaction command that fail with InterruptedDueToReshardingCriticalSection write concern error code not have any error label (meaning no retries would be performed) since we only pass in the top-level error code into the isTransientTransactionError(), and so if InterruptedDueReshardingCriticalSectionTimeout is inside wcCode, then "TransientTransactionError" would not get appended. As described in (1), it wouldn't be safe for the response to have the "TransientTransactionError" label anyway.
Given this, we should make InterruptedDueToReshardingCriticalSection a retriable error and not have it here.
- If the interrupt occurs while the transaction is still running a read/write command, the response would have the "TransientTransactionError" label.
- If the interrupt occurs while the transaction has started running commitTransaction or abortTransaction, the response would have the "RetryableWriteError" label instead of "TransientTransactionError" label because of this and this.
- If the interrupt occurs while the transaction is waiting for write concern for the commitTransaction or abortTransaction command, the response would have a "RetryableWriteError" (here).
- Upon seeing a "TransientTransactionError" error, the driver and transaction API would retry the entire transaction with a higher txnNumber.
- Upon seeing a "RetryableWriteError" error, the driver and transaction API would retry the commitTransaction or abortTransaction command instead of the entire transaction.
Please note that setFCV aborts unprepared transactions with InterruptedDueToFCVChange. SERVER-100456 has made it a retriable error.
- is related to
-
SERVER-106707 Resharding donors should abort in-progress unprepared transactions upon transitioning to "preparing-to-block-writes" to lower the chance of critical section timeout
-
- Closed
-
-
SERVER-100456 FCVOpObserver interrupts unprepared transactions with a non-retriable error
-
- Closed
-
- related to
-
SERVER-107397 Resharding donors should only abort unprepared transactions upon transitioning to the "preparing-to-block-writes" when the FCV is 8.2+
-
- Closed
-