ISSUE DESCRIPTION AND IMPACT
When a mongod instance runs as a router in v8.0, it cleans up sessions older than TransactionRecordMinimumLifetimeMinutes (default 30 minutes) without checking for prepared transactions. This can lead to prepared transactions being prematurely cleaned up if they persist longer than this configured value.
IMPACT:
1- Torn Cross-Shard Transaction:
If a transaction is reaped from the primary before its commit arrives, the primary responds with NoSuchTransaction. The coordinator treats this as a successful commit, leading to a silent torn-transaction and very limited recovery.
- DIAGNOSIS: There is currently no way to diagnose this issue directly from the server.
- REMEDIATION: No remediation can be performed directly on the server.
Possible Scenarios for Prepared Transactions Remaining longer on primary:
- [Most likely] System Bugs:
- API version bug (
SERVER-106075)
- API version bug (
- [Most likely] Point-in-Time (PIT) Restore:
- A transaction was in the prepare state when the backup was taken. The system is later restored at a wall-clock time long after the TransactionRecordMinimumLifetimeMinutes has elapsed.
- [Less likely] Transaction Coordinator Problems:
- Transactions can be delayed by: coordinator primary failover; insufficient majority write concern for the decision; network issues; participant shard unavailability/slowness; frequent elections; storage engine/IO pressure on the coordinator; and replication lag on the coordinator shard.
2- Secondary Crash (Recoverable on restart):
One consideration for the impact of this bug is that now a secondary can acknowledge a write from writing down the oplog entry, rather than waiting for it to be applied. This means that the secondary could have written the oplog entry for the commit and responded acknowledgement back to its sync source while still hitting this bug. This will cause secondary to crash.
- DIAGNOSIS: The secondary node will crash.
- REMEDIATION: The node should recover upon restart and automatically resolve the issue, as the transaction will be re-prepared during startup using the config.transactions table.
Possible Scenarios for Prepared Transactions Remaining longer on secondary:
- [Most likely] System Bugs:
- Deadlock with DbHash and renameCollection (
SERVER-103744)
- Deadlock with DbHash and renameCollection (
- [Less likely] Secondary Lag.
- Secondary is lagged by more than TransactionRecordMinimumLifetimeMinutes after it has already prepared the transaction and is waiting for applying the commit oplog.
AFFECTED VERSIONS
- 8.0.0 - 8.0.12
—-----------------------------------------------------
Original description
When the node is acting as a router (because we add the router role to all nodes here), it enables the router code for session reaping. The reap function will look for session ids that have potentially expired based on the last time the session was checked out. It will then use that list of potentially expired sessions to check which ones have been removed from config.system.sessions by sending a find request to the primary.
Based on the logs from the repro, the session has been reaped from the primary already (since it receives commit acknowledgement when the secondary writes the commitTransaction oplog entry, not when it applies it). This means that the secondary will think that it can reap the session, and will check if the router thinks that the session can be reaped, but does not check if the transactionParticipant thinks that it can be reaped.
When the session gets reaped, it ends up destructing the transaction participant, which will destruct the txnResources, which ends up aborting the write unit of work that has the prepared updates. This results in a potential data inconsistency between this node and the other nodes in the replica set (and if it is serving reads, a potential torn multi-document transaction).
- is related to
-
SERVER-103744 Deadlock between renameCollection, dbHash, and prepared transaction
-
- Closed
-
-
SERVER-106075 Prepared Transactions with apiVersion Fail to Resume After Primary Failover
-
- Closed
-
-
SERVER-106145 Evaluate Test Coverage for Short Session and Transaction Timeouts
-
- Backlog
-
- related to
-
SERVER-106051 Investigate reducing default timeout server parameters for Antithesis suites
-
- Open
-