Distributed transactions which write to document(s) on exactly one shard and read document(s) from at least one other shard may execute more than once in the presence of a primary failover.
This issue does not affect multi-document transactions involving a single shard or that write to multiple shards.
When a client attempts to commit a multi-document transaction, the driver receives one of the following responses to the commitTransaction command:
- The transaction has definitively committed.
- The transaction has definitively aborted.
- It is unknown whether the transaction has committed or aborted.
Clients using either driver’s callback transactions API or driver’s core transactions API would automatically retry the commitTransaction command to learn the definitive result of the transaction.
Due to a bug in the router, the driver may be wrongly told “the transaction has definitely aborted and its operation should be automatically retried in a new transaction,” when the transaction has successfully been committed. This bug manifests when the commitTransaction command must be retried to learn the definitive result of the transaction and a primary failover has occurred in the intervening time on one of the shards that document(s) were only read from.
Distributed transactions which write to document(s) on exactly one shard and read document(s) from at least one other shard may execute more than once.
This affects 4.2.7 and earlier versions of 4.2.
The fix will be included in 4.4.0 and 4.2.8.
- related to
SERVER-48340 Re-enable single-write-shard transaction commit optimization