[SERVER-35730] Assert replica set transactions cannot fail to commit their WriteUnitOfWork rather than abort Created: 21/Jun/18  Updated: 29/Oct/23  Resolved: 21/Sep/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: prepare_testing
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-35865 Write commit oplog entry on commit of... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2018-08-27, Repl 2018-09-10, Repl 2018-09-24
Participants:

 Description   

When we try to commit a single replica set transaction, we call commit on its underlying WriteUnitOfWork. If this commit call fails, we will then mark the transaction as aborted. We currently don't have any unit or integration tests that exercise this case. We could consider testing this by simply mocking out a behavior of the OperationContext so that we throw an exception before we set the committed flag to true.



 Comments   
Comment by Githook User [ 21/Sep/18 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-35865 SERVER-35816 SERVER-35730 Write commit oplog entry on commit of prepared transaction
Branch: master
https://github.com/mongodb/mongo/commit/340e33483d69983c976dff0ed5ab2ff0b036237c

Comment by Judah Schvimer [ 14/Sep/18 ]

This ticket will be committed in the same patch as SERVER-35865.

Comment by Siyuan Zhou [ 06/Sep/18 ]

Talked with Dan, I confirmed that WiredTiger detects write conflict on write time rather than commit time, so we expect commit() to succeed unless there are some fatal failures. If there's another storage engine that detects write conflicts on commit, we may need to prepare for that, but this issue won't be the only surprise for us. Thus I'm changing the ticket to remove the abortGuard to reflect the real contract.

Comment by Siyuan Zhou [ 15/Aug/18 ]

I agree it's nice to assume storage engine can throw a WCE on on an unprepared WUOW.commit, but we cannot test the exception handling code properly in that way. If we decide to keep the current behavior not to throw, I'd suggest changing the title of this ticket to remove the exception handling code.

Comment by Daniel Gottlieb (Inactive) [ 14/Aug/18 ]

Welp, I also thought the WUOW.commit method could throw a WCE if WT's commit returned WT_ROLLBACK, but william.schultz's linked code certainly catches that and decides to terminate. Terminating when WT fails to commit (as opposed to only terminating with the registered changes throw an exception) seems to date back to WT's introduction in 3.0.

From a hypothetical API perspective, I would say a storage engine can throw a WCE on an unprepared WUOW.commit and catching a WCE implies the "rollback" changes were processed successfully. However, unsurprisingly, a storage engine must succeed in committing a prepared transaction. But confirming what the WT code does today and in 4.0, I don't see a WCE ever escaping a WUOW.commit call.

Comment by Siyuan Zhou [ 14/Aug/18 ]

geert.bosch and daniel.gottlieb, as Will mentioned earlier, commit() seems to treat all errors on commit as fatal, but I remembered the assumption in 4.0 was commit() can throw for unprepared transactions. Could you please confirm whether commit() can throw for both unprepared and prepared transactions?

Comment by William Schultz (Inactive) [ 14/Aug/18 ]

I think this is still worthwhile to test, even with new changes being made as part of prepare work. This is the relevant code block to execute now. judah.schvimer I am moving it into the prepare epic with a "testing" label. Writing a unit test that simulates an exception being thrown by WriteUnitOfWork::commit should probably suffice.

Comment by William Schultz (Inactive) [ 21/Jun/18 ]

Although the current implementation of WiredTiger commit may treat all errors as fatal, this is not a generic behavior that should be relied upon for code calling into the storage engine. It should be expected that a call to commit may fail (throw an exception), and so calling code should be robust to this. Thus, the code block mentioned in the previous comment should be retained, and it could be valuable to test it.

Generated at Thu Feb 08 04:40:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.