The Rust driver had a bug where retries of commitTransaction would fail if the error that caused the retry occurred during connection check out. This is because the Rust driver would only generate a transaction ID after a connection was checked out, since it used the cached hello response from the connection to determine if retryability should be used for that operation.
We've added a test for this issue to Rust's test suite, but it might be worth considering adding a spec test for to catch this in other drivers. Not sure if this is widely-applicable or not, though.
The test we added is as follows:
- Ensure the deployment is a replica set and supports failing initial handshakes with appName via failCommand
- Create a client with SDAM and CMAP monitoring enabled, and a really high heartbeatFrequencyMS
- Start a ClientSession
- Start a transaction on the session
- insert one document using the session
- Enable a failpoint on the "ping" command with error code 11600
- This will cause the connection pools to be cleared, ensuring commitTransaction attempts to create a new connection
- Run a "ping" command
- Using SDAM monitoring, ensure the server is marked as unknown and then rediscovered
- Failing with a state change error requests an immedaite check, so heartbeatFrequencyMS being high won't affect this
- Enable a failpoint on the legacy hello and hello commands that fails once with the 11600 error code and the appName of the client
- Invoke session.commitTransaction, and verify it succeeds
- Via CMAP events ensure a ConnectionCheckOutFailedEvent and then a ConnectionCheckedOut event were seen
Who is the affected end user?
Driver authors and potentially end users if the bug is found.
How does this affect the end user?
Some commitTransaction operations with an InvalidOptions error through no fault of the user if the bug is present.
How likely is it that this problem or use case will occur?
Edge case, pretty rare.
If the problem does occur, what are the consequences and how severe are they?
Commiting a transaction will fail.
Is this issue urgent?
Is this ticket required by a downstream team?
Is this ticket only for tests?
Well it's a test improvement that may or may not lead to bug fixes, depending on if the driver is affected.