[SERVER-38583] Transactional bulkWrite error missing writeErrors (mongod) Created: 12/Dec/18 Updated: 29/Oct/23 Resolved: 12/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Write Ops |
| Affects Version/s: | 4.0.4 |
| Fix Version/s: | 4.1.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vick Mena (Inactive) | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||
| Backwards Compatibility: | Major Change | |||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
|||||||||||||||||||||||||||||
| Steps To Reproduce: | Start a 3 node replset
I'm using the following test code
|
|||||||||||||||||||||||||||||
| Sprint: | Repl 2019-01-14, Repl 2019-01-28, Repl 2019-02-11, Repl 2019-02-25 | |||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | |||||||||||||||||||||||||||||
| Description |
|
Testing the bulkWrite DUP Key example from https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/#bulk-write-operations in a transaction I can't find the writeErrors array in the result. Server: 4.0.4-ent
Thinking that perhaps this is a client side issue I looked at the wire protocol response and I don't see it there either
How do I identify failed items? |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 21/Feb/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note: do not backport. This is a behavior change, best to do it in 4.2 rather than 4.0.x. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 12/Feb/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 12/Feb/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Jan/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, it is my plan to make multi-document transaction write errors include ok: 1. Drivers were written with the assumption that errors from bulk writes have the same structure whether in a transaction or not, so I'd like to make that assumption true. I see your point, that if the 50th document in a bulk insert fails, then in a transaction the previous 49 inserts were rolled back. But that's the same today if the application does writes one by one:
Here, too, it's the application's job to understand that the successful writes have been rolled back. So I think the goal should be to make bulk write error reporting the same, whether in a transaction or not. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 09/Jan/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You didn't specify explicitly, but it is possible you are planning to make the multi-document transaction return ok:1 instead of ok:0? I don't think we can change the way the ok field is being presented here, since with non-transactional batch commands it is entirely possible that some writes succeeded and others did not (the failing ones then appearing in the writeErrors array), thus ok:1; whereas with a batch command inside a transaction, with any write failure none of the writes in the batch can succeed and they all always roll back, thus ok:0. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Jan/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Right, only inserts currently have difficulty determining which document failed in the batch, when part of a multi-document transaction. However, all write commands exhibit the same behavior: although they all have ok: 1 and include writeErrors if they fail outside a multi-document transaction, they have ok: 0 and omit writeErrors in a transaction. I think this is a bug, they should have the same error format whether they're in a multi-document transaction or not. I propose to update this storage API to indicate which record in the vector failed (the method will quit on the first error). Currently it's:
It will become:
From there, I can update the error handling for all write commands in write_ops_exec.cpp to fix the error format for transactional writes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 09/Jan/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe only the insert command works this way; update and delete do not do this – is that correct? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Jan/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here's a regular error:
Versus an error in a transaction:
The error in the transaction has ok: 0 and it lacks writeErrors, as well as opTime and electionId. The error info is moved to the top level of the reply. Theory: in insertBatchAndHandleErrors, outside a transaction, we try a bulk write two ways: first, try it all at once efficiently, and if that fails, try each operation within the bulk write individually, in order to determine which one(s) failed. Then serializeReply() assembles the reply with ok: 1 and writeErrors seen above. But if we are in a transaction, then insertBatchAndHandleErrors can't use that technique (we'd need nested transactions or savepoints in WiredTiger), so it throws an error without knowing which records failed. The exception bubbles up past where serializeReply() would have been called; the exception is handled much higher, perhaps in execCommandDatabase(), which marks the command ok: 0. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Craig Homa [ 20/Dec/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey Repl, the Query team feels that it would be best for you to help with this. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 14/Dec/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note that for 4.2 this message will be better because of | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 14/Dec/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Or are you saying you want to see the full document that caused the failure? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 14/Dec/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you clarify something for me? The error message contains the key value that caused the error, and transaction aborts on first error, so what information is lost? You won't ever have failed itemS only a single failed item, no? |