[SERVER-35730] Assert replica set transactions cannot fail to commit their WriteUnitOfWork rather than abort Created: 21/Jun/18 Updated: 29/Oct/23 Resolved: 21/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.4 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_testing | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Repl 2018-08-27, Repl 2018-09-10, Repl 2018-09-24 | ||||||||
| Participants: | |||||||||
| Description |
|
When we try to commit a single replica set transaction, we call commit on its underlying WriteUnitOfWork. If this commit call fails, we will then mark the transaction as aborted. We currently don't have any unit or integration tests that exercise this case. We could consider testing this by simply mocking out a behavior of the OperationContext so that we throw an exception before we set the committed flag to true. |
| Comments |
| Comment by Githook User [ 21/Sep/18 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: |
| Comment by Judah Schvimer [ 14/Sep/18 ] |
|
This ticket will be committed in the same patch as |
| Comment by Siyuan Zhou [ 06/Sep/18 ] |
|
Talked with Dan, I confirmed that WiredTiger detects write conflict on write time rather than commit time, so we expect commit() to succeed unless there are some fatal failures. If there's another storage engine that detects write conflicts on commit, we may need to prepare for that, but this issue won't be the only surprise for us. Thus I'm changing the ticket to remove the abortGuard to reflect the real contract. |
| Comment by Siyuan Zhou [ 15/Aug/18 ] |
|
I agree it's nice to assume storage engine can throw a WCE on on an unprepared WUOW.commit, but we cannot test the exception handling code properly in that way. If we decide to keep the current behavior not to throw, I'd suggest changing the title of this ticket to remove the exception handling code. |
| Comment by Daniel Gottlieb (Inactive) [ 14/Aug/18 ] |
|
Welp, I also thought the WUOW.commit method could throw a WCE if WT's commit returned WT_ROLLBACK, but william.schultz's linked code certainly catches that and decides to terminate. Terminating when WT fails to commit (as opposed to only terminating with the registered changes throw an exception) seems to date back to WT's introduction in 3.0. From a hypothetical API perspective, I would say a storage engine can throw a WCE on an unprepared WUOW.commit and catching a WCE implies the "rollback" changes were processed successfully. However, unsurprisingly, a storage engine must succeed in committing a prepared transaction. But confirming what the WT code does today and in 4.0, I don't see a WCE ever escaping a WUOW.commit call. |
| Comment by Siyuan Zhou [ 14/Aug/18 ] |
|
geert.bosch and daniel.gottlieb, as Will mentioned earlier, commit() seems to treat all errors on commit as fatal, but I remembered the assumption in 4.0 was commit() can throw for unprepared transactions. Could you please confirm whether commit() can throw for both unprepared and prepared transactions? |
| Comment by William Schultz (Inactive) [ 14/Aug/18 ] |
|
I think this is still worthwhile to test, even with new changes being made as part of prepare work. This is the relevant code block to execute now. judah.schvimer I am moving it into the prepare epic with a "testing" label. Writing a unit test that simulates an exception being thrown by WriteUnitOfWork::commit should probably suffice. |
| Comment by William Schultz (Inactive) [ 21/Jun/18 ] |
|
Although the current implementation of WiredTiger commit may treat all errors as fatal, this is not a generic behavior that should be relied upon for code calling into the storage engine. It should be expected that a call to commit may fail (throw an exception), and so calling code should be robust to this. Thus, the code block mentioned in the previous comment should be retained, and it could be valuable to test it. |