-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Replication
-
Fully Compatible
-
ALL
-
v8.2, v8.1, v8.0, v7.0, v6.0
-
Repl 2025-06-23, Repl 2025-07-07, Repl 2025-07-21, Repl 2025-08-04
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
ISSUE DESCRIPTION AND IMPACT
MongoDB uses a two-phase commit protocol to handle customer cross-shard transactions. This protocol works in the following way:
- Prepare Phase: The transaction is prepared on all involved shards, ensuring that each shard is ready to commit.
- Commit Phase: Once all shards successfully prepare the transaction, a commit command is sent to all shards. The system waits for acknowledgments from all shards before confirming success to the client. At this point, the client expects the transaction's data to be committed across all shards.
The problem arises when a client explicitly sets {apiVersion} in their transaction, and during the two-phase commit process:
- The transaction reaches the prepare phase successfully.
- A failover event occurs on some of the shards (e.g., the primary on that shard steps down, a new primary is elected, or the same primary restarts and resumes).
- The shard that undergoes failover may return an "API Version Mismatch" error when the commit command is issued. This causes the transaction to remain in the "prepared" state on that shard.
- The two-phase commit coordinator misinterprets this error as a successful acknowledgment and incorrectly marks the transaction as committed. It then returns success to the client.
- Transactions left in the prepared state can block further write or read operations involving the affected documents (especially those with higher timestamps than the prepared transaction's timestamp).
Impact:
- Version v8.0–v8.0.12: Due to a separate bug (
SERVER-105751), prepared transactions may be "reaped" (removed) after a default timeout of 30 minutes (TransactionRecordMinimumLifetimeMinutes).- This potentially leaves the data in a torn state across shards, leading to logical data inconsistency where clients observe inconsistent transaction outcomes.
- DIAGNOSIS: There is currently no way to diagnose this issue directly from the server.
- REMEDIATION: No remediation can be performed directly on the server.
- In case not hitting (
SERVER-105751): The prepared transaction remains indefinitely:- DIAGNOSIS:
- This will cause persistent issues for subsequent operations. This might results in Frequent `writeConflict` errors when modifying documents in prepared transactions.
- This will also cause unbounded growth of the oplog, as prepared transactions will block oplog truncation from advancing.
- REMEDIATION:
- If the commit/abort state of the transaction can be determined from other shards (via logs, oplog, or config.transactions) or from the client, manual intervention is required to abort or commit the blocked prepared transaction. However, if definitive data is unavailable, recovery cannot be guaranteed.
- DIAGNOSIS:
AFFECTED VERSIONS
- 5.0.0 - 5.0.31
- 6.0.0 - 6.0.26
- 7.0.0 - 7.0.25
- 8.0.0 - 8.0.15
- 8.2.0 - 8.2.1
—-----------------------------------------------------
Original description
A prepared transaction that was initiated with apiVersion set cannot be continued on a new primary after a failover. This is because we do not preserve apiParameters (such as apiVersion) during oplog application for prepared transactions. As a result, when the new primary takes over, it will have empty apiParameters.
When the transaction coordinator later sends the commit or abort decision to the new primary, the new primary will detect an APIMismatchError and assert. However, the coordinator will treated as an acknowledgement, leading to a situation where the distributed transaction may be committed on some shards but remain stuck in the prepared state on others.
This can result in a partially committed transaction after a failover, which is an unsafe state.
Given that apiParameters is saved as in-memory state, we could also hit this error if the same primary restarted and stepped up to become a primary after going through startup recovery.
- is caused by
-
SERVER-56550 Require Versioned API options for getMore and transaction-continuing commands
-
- Closed
-
- is fixed by
-
SERVER-108366 Prepared Transactions with apiVersion
-
- Closed
-
- is related to
-
SERVER-106141 Investigate Test Coverage for Commands Explicitly Setting apiVersion
-
- Closed
-
- related to
-
SERVER-105751 Node acting as router can reap TransactionParticipant that still has a prepared transaction on it
-
- Closed
-