[SERVER-67820] SEPTransactionClient::runCRUDOp should inspect writeError from response Created: 06/Jul/22 Updated: 29/Oct/23 Resolved: 11/Aug/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Sanika Phanse (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam3, txn-api-improvements | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Sharding 2022-08-08, Sharding 2022-08-22 | ||||
| Participants: | |||||
| Description |
|
Currently it only checks the top level error fields from the command response here. It should be calling getStatusFromWriteCommandReply instead. |
| Comments |
| Comment by Jack Mulrow [ 22/Jul/22 ] |
|
That argument sounds reasonable to me. Going ahead with the change described in this ticket SGTM. |
| Comment by Randolph Tan [ 11/Jul/22 ] |
|
jack.mulrow@mongodb.com, I think this would make the api more cumbersome to use. This is because the caller would be required to check for write errors for every call to runCRUDOp even though the underlying transaction has already been aborted. If the caller didn't check for write errors, this can also end up getting stuck retrying because the api would try to do txn stuff on a txnNumber that was already aborted and get a txnNumber no longer exist error, retry from the beginning and get the same error again and again. |
| Comment by Jack Mulrow [ 08/Jul/22 ] |
|
This was actually the intended behavior. I wasn't sure if users of the API could hit cases with either expected errors (like WouldChangeOwningShard) or multiple write errors, so I didn't want runCRUDOp to always throw on any write error and deferred that to the caller instead. I figured always checking the top-level error was fine though since that should mean the whole command failed, which I couldn't imagine being an expected error. That said, this might not have been the right choice and I'm open to changing it (or maybe making this behavior more obvious somehow), I just wanted to give some context. |