[SERVER-43889] Distinguish between a retryable write and a transaction when failing a command Created: 03/Oct/19 Updated: 29/Oct/23 Resolved: 18/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.6, 4.3.4, 4.0.19 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Marek Kresnicki | Assignee: | Ali Mir |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||
| Backport Requested: |
v4.2, v4.0
|
||||||||||||||||
| Sprint: | Repl 2020-02-10, Repl 2020-02-24 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Currently, most txnNumber errors refer only to transactions, even if the error pertains to a retryable write. This can create a poor user experience for someone who is only using retryable writes. We should audit these types of errors on the server and determine whether the command was a part of a retryable write or a transaction. Depending on the answer, we can return one of the following errors: "Cannot start transaction X on session <UUID> because a newer transaction or retryable write with txnNumber Y has already started on this session." "Retryable write with txnNumber X is prohibited on session <UUID> because a newer transaction or retryable write with txnNumber Y has already started on this session." ORIGINAL POST: Code that reproduces the issue:
When running above code I'm getting the exception: MongoCommandException: Command update failed: Cannot start transaction 1 on session c4fd4081-e0f0-40d6-84fc-cdfeccc74e3b - hf9LJrhq2lfp667gnURrBVE7MCUS1NJZYDmWlqfKWl0= because a newer transaction 3 has already started.. This is misleading as there's no explicit transaction in progress. I know now that we should use either
background: I've encountered that issue in production code where one method is being used with and without transaction - our solution was to simply create a intermediate overloaded method that creates a session for you and passes it into method that accepts the session. |
| Comments |
| Comment by Githook User [ 07/Apr/20 ] |
|
Author: {'name': 'Ali Mir', 'email': 'ali.mir@mongodb.com', 'username': 'ali-mir'}Message: (cherry picked from commit eb284b042c71edf0eac445d3ceb79f7fdeabc5d1) |
| Comment by Githook User [ 31/Mar/20 ] |
|
Author: {'name': 'Ali Mir', 'email': 'ali.mir@mongodb.com', 'username': 'ali-mir'}Message: (cherry picked from commit eb284b042c71edf0eac445d3ceb79f7fdeabc5d1) |
| Comment by Githook User [ 18/Feb/20 ] |
|
Author: {'name': 'Ali Mir', 'username': 'ali-mir', 'email': 'ali.mir@mongodb.com'}Message: |
| Comment by Pavithra Vetriselvan [ 09/Dec/19 ] |
|
Hi eric.sedor, thank you for fixing the comments! I do agree that we can provide a better user experience here but think it will require a joint effort between the Server and Drivers teams. To aid discussion, I am moving this conversation to email. |
| Comment by Eric Sedor [ 20/Nov/19 ] |
|
Unfortunately I don't think that will help with the confusion this message causes... Is there any way to implicate the fact that the error is happening because a write is being retried? Ideally we could distinguish between
|
| Comment by Pavithra Vetriselvan [ 19/Nov/19 ] |
|
eric.sedor, do you think changing the error message to what I suggested above is an adequate fix here? |
| Comment by Pavithra Vetriselvan [ 11/Nov/19 ] |
|
Hi eric.sedor, I think SERVER-35411 focuses more on remembering why a transaction aborts so that we can more reliably deliver a ‘TransientTransactionError’ label to drivers. We throw with the error mentioned in this ticket because we are trying to run an operation with a lower txnNumber than the current txnNumber. When running retryable writes concurrently on the same session, we can imagine a scenario where write 1 (txnNumber = 1) fails, write 2 (txnNumber = 2) starts and overwrites the txnNumber for the session, and write 1 (txnNumber = 1) retries. Since we're explicitly checking the txnNumber, which is used for retryable writes and transactions, one solution could be changing the error message to: |
| Comment by Eric Sedor [ 25/Oct/19 ] |
|
Does SERVER-35411 help us be more specific here? |
| Comment by Eric Sedor [ 10/Oct/19 ] |
|
We wanted to let you know we are looking into this. Thanks in advance for your patience. |
| Comment by Marek Kresnicki [ 03/Oct/19 ] |
|
Forgot to mention that this has been working fine in version 2.8.1 |