The failCommand fail point should have an option to return the configured error only after executing the command normally (similar to how the failBeforeCommitExceptionCode option works for the onPrimaryTransactionalWrite fail point). Right now, the error is returned before executing the command. This is important for testing sharded transactions in drivers because the recoveryToken only allows us to recover the state of the transaction.
I did not see the need for this feature earlier because I did not connect the dots with how the recoveryToken works and how the fail point works. Specifically, a commit on a new mongos cannot initiate the 2PC procedure when the first commit is never initiated on the original mongos, instead it simply waits for the transaction to timeout.
So right now, drivers can only test the following scenarios:
- commit succeeds on mongos A with no retry.
- commit fails on mongos A with no retry (because of a non-retryable error).
- commit fails on mongos A, retry succeeds on mongos A.
- commit fails on mongos A, retry fails on mongos A.
- commit fails on mongos A, retry fails on mongos B.
With this feature we can start testing the following:
- commit fails on mongos A (but it actually succeeds on the cluster), retry succeeds on mongos B because the initial commit succeeded.