[SERVER-41062] Always return TransactionTooLarge rather than BSONObjectTooLarge when transaction is too large to fit in a single applyOps Created: 08/May/19 Updated: 29/Oct/23 Resolved: 02/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.9 |
| Fix Version/s: | 4.0.11 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jason Chan | Assignee: | Siyuan Zhou |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Repl 2019-07-01, Repl 2019-07-15 | ||||||||
| Participants: | |||||||||
| Description |
|
We should backport the moving of the try clause in onTransactionCommit up to also cover the opsArray.done() call since it is possible for that call to throw a BSONObjectTooLarge exception, as we do in The call in question on the 4.0 branch is here: https://github.com/mongodb/mongo/blob/ed33d2f8e66cd907ef13ee162f6a1dac3ff25672/src/mongo/db/op_observer_impl.cpp#L918 |
| Comments |
| Comment by Githook User [ 02/Jul/19 ] | ||||
|
Author: {'name': 'Siyuan Zhou', 'email': 'visualzhou@gmail.com', 'username': 'visualzhou'}Message: | ||||
| Comment by Siyuan Zhou [ 13/May/19 ] | ||||
|
Thanks, william.schultz for the clarification. Even if getDurableReplOperationSize() overestimates the size, it'll still be a subtle assumption to rely on. To code defensively and to make the code easier to reason about, it's still better to make the change in 4.0. I think it's fine to just extend try-catch to cover more lines without a targeted test. That's easier than guaranteeing the overestimation. However as you mentioned offline, its priority is low. | ||||
| Comment by William Schultz (Inactive) [ 13/May/19 ] | ||||
|
To clarify, my hypothesis was that the size calculation made by repl::OplogEntry::getDurableReplOperationSize overestimates by enough bytes so that by the time we place the operation into a BSONArray inside an applyOps object, we would have already failed inside TransactionParticipant. I think the thing to investigate would be to compare the size estimate returned by repl::OplogEntry::getDurableReplOperationSize for a given operation with the actual size needed to place it inside an applyOps object subarray. | ||||
| Comment by Siyuan Zhou [ 13/May/19 ] | ||||
|
According to the comment, we don't assumeĀ repl::OplogEntry::getReplOperationSize() gives an overestimate of the actual size, so we need to have try-catch to cover array.done(), which is what the try-catch was supposed to cover.
| ||||
| Comment by Jason Chan [ 09/May/19 ] | ||||
|
After discussing with william.schultz, this backport might not be necessary since we check the transaction size here https://github.com/mongodb/mongo/blob/76912f3714ddcd82a3b53e1ecebba0044ac13d07/src/mongo/db/session.cpp#L1071 before entering the Op Observer onCommit call. The bug existed in 4.2 because of the specific edge case where we start a 4.2 style transaction with no size limit, and then perform a downgrade and try to commit that large transaction. At the point of commit, opsArray.done() will end up trying to fit the large transaction into a single applyOps and throw BSONObjectTooLarge. We don't expect this to happen in 4.0 since we would be unable to create a large transaction in the first place. We should instead be hitting the error in addTransactionOperation before ever reaching the onTransactionCommit. |