[SERVER-43860] Pipeline style update in $merge can produce unexpected result Created: 07/Oct/19 Updated: 29/Oct/23 Resolved: 14/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.2.0 |
| Fix Version/s: | 4.2.2, 4.3.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Anton Korshunov | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Major Change | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||||||||||||||
| Sprint: | Query 2019-11-04, Query 2019-11-18 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Linked BF Score: | 50 | ||||||||||||||||||||||||||||||||
| Description |
|
When a $merge stage with a custom pipeline cannot match a document in the target collection, it will insert a new document created by running the pipeline on an empty document. For example,
Here, we execute an upsert with a custom pipeline. For pipeline updates, if we don’t match any documents, we generate a new document to insert by running the pipeline with an empty input document (and, in the case of $merge, the original document as $$new). In the example above, that means we’re doing this: This is also inconsistent with other whenMatched modes. E.g., with 'whenMatched: replace, whenMatched: insert', we'd insert the document { "_id" : "2019-05", "thumbsup" : 14, "thumbsdown" : 10 }. It may also be confusing that we're executing a pipeline defined in the whenMatched branch, when we fall under the whenNotMatched branch. We should consider different options to see if user experience can be improved. This could be a simple solution to update our documentation to clearly describe the existing behaviour, or just the semantics of pipeline style updated with $merge (for example, but inserting the original document accessed via $$new when there is no match). |
| Comments |
| Comment by Githook User [ 03/Dec/19 ] | |||
|
Author: {'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman', 'username': 'gormanb'}Message: (cherry picked from commit 23e55cb3d041236f399f7095df31cd3e3da491cc) | |||
| Comment by Githook User [ 02/Dec/19 ] | |||
|
Author: {'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman', 'username': 'gormanb'}Message: | |||
| Comment by Githook User [ 14/Nov/19 ] | |||
|
Author: {'username': 'gormanb', 'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman'}Message: | |||
| Comment by Asya Kamsky [ 12/Nov/19 ] | |||
|
Btw, note that the above example can be easily fixed by using $sum instead of $add - that will treat nulls and missing as 0's. | |||
| Comment by Bernard Gorman [ 07/Oct/19 ] | |||
|
asya: I think the behaviour you outline above could be quite easily achieved within a single WUOW because, when issuing upsert from $merge, we serialize the source document along with the update command, which then becomes available to the update pipeline during execution via the $$new variable. So we could add logic to the upsert subsystem whereby, if the update does not match any documents AND the $$new field exists in the update spec, then we just insert $$new into the collection as-is. | |||
| Comment by Asya Kamsky [ 07/Oct/19 ] | |||
|
I don't think that this can be successfully satisfied with a single write operation. If we do update with the pipeline that's demonstrably wrong when the document does not exist in the collection we are merging into. That means logically correct operation sequence would be:
This is a scenario where if the insert attempt is not done in a single "unit of work" (whether transaction or something else) then it's possible that insert will fail due to another thread inserting the matching document. Then in theory the update on error should now work unless this document was deleted between failed insert and second update. (similar sequence would be to just always insert first and on error run update) The question is whether this sequence is better than what currently happens which is in my opinion yes, the only other option is really for us to have some weird restriction that when you specify custom pipeline, it really must handle both when matched and when not matched. Or we disallow "insert" on whenNotMatched in this combination (error and discard are fine). |