Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-13226

Investigate changes in SERVER-43860: Pipeline style update in $merge can produce unexpected result

      Description

      Downstream Change Summary

      Previously, a $merge aggregation with parameters {whenMatched: [ pipeline ], whenNotMatched: "insert"} implemented this behaviour by generating a pipeline-style update command with {upsert:true}. However, this did not correctly capture the semantics of the $merge statement. If no document in the target collection matched the document from the source collection - that is, we hit the {whenNotMatched: "insert"} condition - then the source document was discarded instead of being inserted, and instead an entirely new document was generated by running the 'whenMatched' pipeline over an empty input doc. This is logically incorrect, and inconsistent with the behaviour of {whenNotMatched: "insert"} in all other contexts.

      This patch fixes the above $merge mode such that its behaviour matches the expectation; if the source document matches a document in the target collection then the 'whenMatched' pipeline is executed, otherwise we hit the 'whenNotMatched' condition and insert the source document into the target collection as-is.

      Depending on the exact nature of the source document and 'whenMatched' pipeline, this may result in a significant change in behaviour from that observed by the user in existing 4.2 versions. Since this is a correctness bug, we will be backporting the fix to 4.2.2 (BACKPORT-5471). Finally, as a side-effect of this change, the $$new variable used to refer to the source document in the 'whenMatched' pipeline is now reserved, and cannot be overridden by the user.

      Description of Linked Ticket

      When a $merge stage with a custom pipeline cannot match a document in the target collection, it will insert a new document created by running the pipeline on an empty document. For example,

      db.monthlytotals.drop()
      db.votes.insertOne(
         { date: new Date("2019-05-07"), "thumbsup" : 14, "thumbsdown" : 10 }
      )
      db.votes.aggregate([
         { $match: { date: { $gte: new Date("2019-05-07"), $lt: new Date("2019-05-08") } } },
         { $project: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, thumbsup: 1, thumbsdown: 1 } },
         { $merge: {
               into: "monthlytotals",
               on: "_id",
               whenMatched:  [
                  { $addFields: {
                      thumbsup: { $add:[ "$thumbsup", "$$new.thumbsup" ] },
                      thumbsdown: { $add: [ "$thumbsdown", "$$new.thumbsdown" ] }
                  } } ],
               whenNotMatched: "insert"
         } }
      ])
      printjson(db.monthlytotals.find().toArray())
      [ { "_id" : "2019-05", "thumbsup" : null, "thumbsdown" : null } ]
      

      Here, we execute an upsert with a custom pipeline. For pipeline updates, if we don’t match any documents, we generate a new document to insert by running the pipeline with an empty input document (and, in the case of $merge, the original document as $$new). In the example above, that means we’re doing this:
       
      thumbsup: { $add:[ MISSING, 14 ] }
      thumbsdown: { $add:[ MISSING, 10 ] }
       
      But the semantics of the $add expression are such that anything added to null or missing produces null.
      This could be confusing to the users as one might expect that the inserted document would be the one that it produced by the $project stage, e.g., { "_id" : "2019-05", "thumbsup" : 14, "thumbsdown" : 10 }.

      This is also inconsistent with other whenMatched modes. E.g., with 'whenMatched: replace, whenMatched: insert', we'd insert the document { "_id" : "2019-05", "thumbsup" : 14, "thumbsdown" : 10 }.

      It may also be confusing that we're executing a pipeline defined in the whenMatched branch, when we fall under the whenNotMatched branch.

      We should consider different options to see if user experience can be improved. This could be a simple solution to update our documentation to clearly describe the existing behaviour, or just the semantics of pipeline style updated with $merge (for example, but inserting the original document accessed via $$new when there is no match).

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Assignee:
            jeffrey.allen@mongodb.com Jeffrey Allen
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              4 years, 6 weeks, 2 days ago