[SERVER-36421] Commit transaction command does not properly abort the transaction if onTransactionCommit throws an exception Created: 02/Aug/18  Updated: 29/Oct/23  Resolved: 13/Aug/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.1.1
Fix Version/s: 4.1.2

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: prepare_basic
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-36295 Transaction metrics not updated on Tr... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2018-08-13, Repl 2018-08-27
Participants:
Linked BF Score: 28

 Description   

When we run commitTransaction, we will call theĀ Session::commitUnpreparedTransaction method (or the analogous method for prepared transactions). Inside this method, we will first transition to state "kCommittingWithoutPrepare", and then trigger the onTransactionCommit OpObserver. Inside that OpObserver call is where we will do a write to update the config.transactions table. If the the onTransactionCommit method throws an exception, then the commitTransaction command will fail, and we will have left the transaction state in "kCommittingWithoutPrepare". When a command running inside a transaction throws an exception, we will trigger this block, to abort the transaction if necessary. In the case described, we would call Session::abortActiveTransaction while the transaction is still in state "kCommittingWithoutPrepare". Since we are not in one of the expected states passed to _abortActiveTransaction, we will not execute the _abortTransactionOnSession method, which is what actually updates the various metadata about the transaction, to indicate that it is aborted. We will, however, clean up the transaction resources that live on the OperationContext. So, even though we called abortActiveTransaction, we never actually transitioned to the "kAborted" state.

The issue can then persist, because the transaction has been left in the "kCommittingWithoutPrepare" state. For example, when we try to run another commit command, we will get an error because the transaction is no longer marked as in-progress. The same error will also be thrown if we try to run abort. One way to get the transaction out of this "limbo" state is to start a new transaction with a higher transaction number on the same session. This will work as a way to clear out the old transaction state, but it still won't trigger an actual call to _abortTransactionOnSession for the previous transaction. When we start a new transaction when one is already running, we will only abort the old transaction if there is one in progress. This means we would start the new transaction without ever explicitly calling the abort method internally.

To fix this, we should probably make sure that we explicitly abort the transaction right away if an exception is thrown inside the OpObserver.



 Comments   
Comment by Githook User [ 13/Aug/18 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}

Message: SERVER-36421 Keep being InProgress when writing oplog entry for unprepared commit.
Branch: master
https://github.com/mongodb/mongo/commit/3fc7971ea53270cccf5bf164862186e19de9a185

Comment by Siyuan Zhou [ 02/Aug/18 ]

Looked into this with Will. We believe this isn't a problem in 4.0 because the transaction state is still `kInProgress` when we call OpObserver. We moved the transition to kCommitting to earlier in 4.2. I think it's correct to think the transition is in progress when committing unprepared transaction calls OpObserver, since it hasn't touch anything on the disk and all data change is in WUOW, which hasn't committed yet.

Comment by William Schultz (Inactive) [ 02/Aug/18 ]

siyuan.zhou judah.schvimer This was discovered after looking into an apparent issue with transactions metrics reported by Bruce Lucas. After discussing with Siyuan, however, I think the core bug is not related to the transactions metrics logic, but with how we track and update transaction state in the session.

Generated at Thu Feb 08 04:43:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.