[SERVER-43860] Pipeline style update in $merge can produce unexpected result Created: 07/Oct/19  Updated: 29/Oct/23  Resolved: 14/Nov/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.2.0
Fix Version/s: 4.2.2, 4.3.2

Type: Improvement Priority: Major - P3
Reporter: Anton Korshunov Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-42511 Remove query knob internalQueryUseAgg... Closed
is depended on by SERVER-44150 MR Agg: Enable tests for output mode ... Closed
Documented
is documented by DOCS-13226 Investigate changes in SERVER-43860: ... Closed
Related
related to SERVER-47581 mongoS does not set 'useNewUpsert' on... Closed
Backwards Compatibility: Major Change
Backport Requested:
v4.2
Sprint: Query 2019-11-04, Query 2019-11-18
Participants:
Linked BF Score: 50

 Description   

When a $merge stage with a custom pipeline cannot match a document in the target collection, it will insert a new document created by running the pipeline on an empty document. For example,

db.monthlytotals.drop()
db.votes.insertOne(
   { date: new Date("2019-05-07"), "thumbsup" : 14, "thumbsdown" : 10 }
)
db.votes.aggregate([
   { $match: { date: { $gte: new Date("2019-05-07"), $lt: new Date("2019-05-08") } } },
   { $project: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, thumbsup: 1, thumbsdown: 1 } },
   { $merge: {
         into: "monthlytotals",
         on: "_id",
         whenMatched:  [
            { $addFields: {
                thumbsup: { $add:[ "$thumbsup", "$$new.thumbsup" ] },
                thumbsdown: { $add: [ "$thumbsdown", "$$new.thumbsdown" ] }
            } } ],
         whenNotMatched: "insert"
   } }
])
printjson(db.monthlytotals.find().toArray())
[ { "_id" : "2019-05", "thumbsup" : null, "thumbsdown" : null } ]

Here, we execute an upsert with a custom pipeline. For pipeline updates, if we don’t match any documents, we generate a new document to insert by running the pipeline with an empty input document (and, in the case of $merge, the original document as $$new). In the example above, that means we’re doing this:
 
thumbsup: { $add:[ MISSING, 14 ] }
thumbsdown: { $add:[ MISSING, 10 ] }
 
But the semantics of the $add expression are such that anything added to null or missing produces null.
This could be confusing to the users as one might expect that the inserted document would be the one that it produced by the $project stage, e.g., { "_id" : "2019-05", "thumbsup" : 14, "thumbsdown" : 10 }.

This is also inconsistent with other whenMatched modes. E.g., with 'whenMatched: replace, whenMatched: insert', we'd insert the document { "_id" : "2019-05", "thumbsup" : 14, "thumbsdown" : 10 }.

It may also be confusing that we're executing a pipeline defined in the whenMatched branch, when we fall under the whenNotMatched branch.

We should consider different options to see if user experience can be improved. This could be a simple solution to update our documentation to clearly describe the existing behaviour, or just the semantics of pipeline style updated with $merge (for example, but inserting the original document accessed via $$new when there is no match).



 Comments   
Comment by Githook User [ 03/Dec/19 ]

Author:

{'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman', 'username': 'gormanb'}

Message: SERVER-43860 Always upsert exact source document for pipeline-insert $merge

(cherry picked from commit 23e55cb3d041236f399f7095df31cd3e3da491cc)
(cherry picked from commit 70ef84cd14388602b3d763b12c6f8448ba62a9d8)
Branch: v4.2
https://github.com/mongodb/mongo/commit/f4404e7ac489bec9b208bc56b1ebb9078a518216

Comment by Githook User [ 02/Dec/19 ]

Author:

{'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman', 'username': 'gormanb'}

Message: SERVER-43860 Upgrade/downgrade mechanism for new $merge upsert behaviour
Branch: master
https://github.com/mongodb/mongo/commit/70ef84cd14388602b3d763b12c6f8448ba62a9d8

Comment by Githook User [ 14/Nov/19 ]

Author:

{'username': 'gormanb', 'email': 'bernard.gorman@mongodb.com', 'name': 'Bernard Gorman'}

Message: SERVER-43860 Always upsert exact source document for pipeline-insert $merge
Branch: master
https://github.com/mongodb/mongo/commit/23e55cb3d041236f399f7095df31cd3e3da491cc

Comment by Asya Kamsky [ 12/Nov/19 ]

Btw, note that the above example can be easily fixed by using $sum instead of $add - that will treat nulls and missing as 0's.

Comment by Bernard Gorman [ 07/Oct/19 ]

asya: I think the behaviour you outline above could be quite easily achieved within a single WUOW because, when issuing upsert from $merge, we serialize the source document along with the update command, which then becomes available to the update pipeline during execution via the $$new variable. So we could add logic to the upsert subsystem whereby, if the update does not match any documents AND the $$new field exists in the update spec, then we just insert $$new into the collection as-is.

Comment by Asya Kamsky [ 07/Oct/19 ]

I don't think that this can be successfully satisfied with a single write operation. If we do update with the pipeline that's demonstrably wrong when the document does not exist in the collection we are merging into. That means logically correct operation sequence would be:

update with pipeline (upsert option false)
if result modified 0 documents: insert
   if result failure due to duplicate "on" field, update with pipeline (upsert option false)

This is a scenario where if the insert attempt is not done in a single "unit of work" (whether transaction or something else) then it's possible that insert will fail due to another thread inserting the matching document. Then in theory the update on error should now work unless this document was deleted between failed insert and second update.

(similar sequence would be to just always insert first and on error run update)

The question is whether this sequence is better than what currently happens which is in my opinion yes, the only other option is really for us to have some weird restriction that when you specify custom pipeline, it really must handle both when matched and when not matched. Or we disallow "insert" on whenNotMatched in this combination (error and discard are fine).

Generated at Thu Feb 08 05:04:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.