[SERVER-44827] Aggregate with $merge fails with command error, not write error Created: 25/Nov/19 Updated: 27/Oct/23 Resolved: 11/Dec/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Patrick Freed | Assignee: | Anton Korshunov |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Query 2019-12-16 | ||||||||
| Participants: | |||||||||
| Description |
|
An aggregate with $merge will fail with a command error where other operations would typically fail with ok: 1 plus a write error (e.g. duplicate key). Is this the intended behavior? In the design doc for this stage, a duplicate key error is shown to be returned as a command error, but the doc doesn't state any rationale for it being in that format. This leads to a weird situation where the aggregate failed with ok: 0, but documents were actually inserted into the into collection with no indication that they were. e.g.
It also presents a problem for statically typed drivers. If the server returns a command error and a write concern error in the same response, those drivers will simply ignore the write concern error, since write concern errors are only generally expected and parsed in the context of write errors (which occur if ok: 1).
e.g. this would just throw/return a CommandError/Exception driver side, with no mention of a write concern error. Additionally, if the server only returns a WriteConcernError, the driver will throw a WriteException, as per the drivers' CRUD specification.
e.g. this would throw a WriteException(WriteError: null, WriteConcernError: ...). |
| Comments |
| Comment by Anton Korshunov [ 09/Dec/19 ] |
|
So, I looked at the code and tracked it back to when it was added first (although this code had largely evolved since then, the error handling mechanism wasn't drastically changed). Firstly, Charlie is right that we upconvert the first write error to a command error. It is done here. However, I'm not sure that it was motivated by the 16MB bulk errors limit and a graceful way out. Likely we just plumbed in writing logic into the existing error handling mechanism in the aggregation pipeline. There was some discussion around reporting batch results, but it was slightly different topic. So, for non-aggregation write operations we simply return a serialized WriteResult object back to the client, possibly appending a writeConcernError info to the reply. For aggregation, the error handling mechanism is different and all aggregate command errors are returned as command errors. So, be it a non-OK status or an exception, it will be returned as a command error. This was so since day one when $out was added in v2.6. This does makes sense to me, as a write operation (in case of $merge or $out pipelines) is executed in the context of an outer operation (which is a pipeline command), and so any error should be reported as a command error, like in case of any other read-only pipeline. Note that reporting the writeConcernError is different from the normal codepath. That is, a command may succeed on the current node (and reported as "ok: 1") but the writeConcernError may still be added to the reply if we detect such an error. This is demonstrated in examples 2 and 3 from the original description. That said, this behaviour seems working as designed to me. |
| Comment by Charlie Swanson [ 25/Nov/19 ] |
|
Passing to Query execution to triage. At a glance, I think this is by design because with $merge there may be large amounts of data being written, the bulk errors for which could easily exceed 16MB, and we wouldn't have a graceful way out at that point. Instead, we chose to upconvert the first write error to a command error. |