[DRIVERS-2093] How should drivers handle multiple WriteConcernErrors in a bulk operation Created: 08/Jan/19 Updated: 11/Oct/22 |
|
| Status: | Backlog |
| Project: | Drivers |
| Component/s: | CRUD |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Major - P3 |
| Reporter: | Patrick Freed | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||
| Description |
|
According to the CRUD spec, a BulkWriteException contains an optional array of WriteError s and a single optional WriteConcernError. It isn't immediately clear to me based on the spec how drivers should handle the case of multiple write concern related errors. Our current implementations seem divided on this too:
The legacy bulk-update spec opts for an array:
Though I'm not sure how relevant that spec is these days. So my questions are:
|
| Comments |
| Comment by Kaitlin Mahar [ 29/Jun/22 ] | ||||||
|
isabel.atkinson@mongodb.com something to consider writing the new spec. linking to DRIVERS-716 | ||||||
| Comment by Shane Harvey [ 30/Jan/19 ] | ||||||
|
Added the known types of errors to the read and write concern spec in SPEC-1207. Here's a link to the section: https://github.com/mongodb/specifications/blob/master/source/read-write-concern/read-write-concern.rst#writeconcernerror-examples | ||||||
| Comment by Bernie Hackett [ 30/Jan/19 ] | ||||||
|
Excellent. I stand corrected. Can we get that list into some relevant spec? | ||||||
| Comment by Shane Harvey [ 29/Jan/19 ] | ||||||
|
While I agree that it should be a rare event, you can definitely have multiple different writeConcernErrors in practice, both on a replica set and sharded cluster. The set of possible writeConcernErrors is actually quite large because they include errors caused by shutdown, stepdown, interruption, maxTimeMS, and wtimeout. See my comment that attempts to list them all. | ||||||
| Comment by Bernie Hackett [ 29/Jan/19 ] | ||||||
|
My argument wasn't that multiple WC errors can't occur. My argument was that it's unlikely that multiple different WC errors would occur. What are the WC errors? We have (paraphrasing the error messages) "not enough nodes to satisfy write concern" (you configured w:5 but only have 4 members in the replica set config), "journaling is disabled" (you set j: true but for some reason journaling is disabled in the server) and "timed out waiting for replication" (you set wtimeoutms to 500 and replication took longer than that). What else? The first two errors are fatal and will just be spammed for every operation you attempt. The third will likely be intermittent. Given the fatality of the first and second errors, I don't see how you can have two different WC errors when talking to a replica set. When talking to a sharded cluster I can see it, but since the first two are fatal and are seemingly a configuration mistake wtimeout seems like the only one we really have to care about. Maybe I'm wrong about all the possible WC errors. Maybe the first step is to enumerate them. | ||||||
| Comment by Shane Harvey [ 29/Jan/19 ] | ||||||
|
For reference, Pymongo adds any writeConcernErrors to a list and includes all of them in the BulkWriteError raised to the user:
However, the problem still remains that these errors are not correlated to a set of writes. | ||||||
| Comment by Jeremy Mikola [ 29/Jan/19 ] | ||||||
|
From a chat session with tess.avitabile in {{#server-ds-replication }}:
So, the take-away is that multiple WC errors can happen over the course of a multi-command bulk write (i.e. insertMany() and bulkWrite()). It's still unclear how useful any write concern error (i.e. BulkWriteException.writeConcernError) will be for the user, since there's no association between that particular WC error and a write command executed during the batch – noting that a WC error on the final command may be more relevant than one from an earlier command on the same socket. For now, I propose we continue to report the last seen WC error but consider allowing drivers to report all WC errors seen over the course of a bulk write. This would constitute a spec change to allow/introduce a BulkWriteException.writeConcernErrors array. I believe libmongoc already does this in _mongoc_write_result_merge() (see also: Bulk Operation Write Concerns). Long term, we should probably discuss this with the product team and see if improving the way we report WC errors would be genuinely useful for our users (perhaps a list of indexes to which the WC error pertains). | ||||||
| Comment by Jeremy Mikola [ 29/Jan/19 ] | ||||||
Naturally, I didn't consider sharding (this line would probably make for a great drinking game). Thinking about this some more, I also think there may even be complications for a replica set. Consider that an early write command in the batch hit a wtimeout on w:majority, a subsequent write comment hit a failover but is successfully retried and replicates within the timeout. The failover could mean that the earlier write was never majority-acknowledged and rolled back – so drivers would do well to raise that WC error to users. That said, the main point of my last comment wasn't about ignoring WC errors, although I definitely segued into that in my second paragraph. In the first paragraph, I was arguing that there is a legitimate case where multiple WC errors could be relevant to the user. If we consider this alongside retryable writes and potentially multiple failovers during the execution of a bulk write, I think it gets more complicated. Unlike write errors, WC errors are not with any operation index or write command (insert|update|delete) in the batch, so I'm not even sure how users could meaningfully make sense of multiple WC errors. The best conclusion may just be "I know something(s) in the batch failed to replicate and, depending on my WC, might have been rolled back". It may be worth to solicit some server team advice, so I'll see if I can get someone from #server-ds-replication to chime in here. | ||||||
| Comment by Bernie Hackett [ 28/Jan/19 ] | ||||||
|
I see what you're saying. Currently PyMongo, for example, would report the write concern error at the end of processing the unordered bulk operations. You're proposing that we shouldn't report it at all if the final batch doesn't have a write concern error? That might work for a replica set, but I'm not sure it would for a sharded cluster. What happens if your last batch just doesn't hit the laggy shard(s)? | ||||||
| Comment by Jeremy Mikola [ 28/Jan/19 ] | ||||||
|
behackett: I think it is possible for there to be multiple WC errors. Consider an unordered (effectively continue-on-error behavior) bulk write where the driver has organized operations into separate insert, update, and delete commands. The insert and update commands both time out awaiting replication and the driver receives a write concern error, while the delete command does replicate within the wtimeout period. Assuming all three commands were executed serially on the same connection, drivers should be able to trust that the write concern on the last issued command speaks for all previous writes. In the example above (insert and update time out, but final delete succeeds), drivers could disregard the previous write concern errors – in effect, only the final command's WC error (or lack thereof) matters. That said, if a driver ever decided to execute bulk writes in parallel across different connections or the server allowed async command execution, this rule would go out the window as we could not reliably determine that the "final" command's WC response speaks for the others. Do you concur? | ||||||
| Comment by Patrick Freed [ 09/Jan/19 ] | ||||||
|
Even when ordered is false, there's no possibility that different batches could have different write concern errors? | ||||||
| Comment by Bernie Hackett [ 09/Jan/19 ] | ||||||
|
There will only be one write concern error, even if it is reported multiple times across batches (timed out waiting for replication, not enough data bearing nodes, etc.), so you don't need to report an array of them. On the other hand, you can have multiple different write errors (duplicate key, document too large, etc.) in a single batch, or across multiple batches, when ordered is false. |