[SERVER-58185] Expected update pipeline behavior of $merge Created: 30/Jun/21  Updated: 03/Sep/21

Status: Investigating
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Alice Thum Assignee: Katya Kamenieva
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: QE 2021-08-23
Participants:

 Description   

I have some sample data as follows:

> db.first.find()
{ "_id" : 1, "flavor" : "chocolate" }
 
 
> db.another.find()
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
{ "_id" : 3 }
{ "_id" : 4, "item" : "ginger" }

When running a pipeline with $merge:

db.first.aggregate([
  {
    $merge: {
      "into": "another",
      "whenMatched": [
        {
          "$project": {
            "_id": 0
          }
        }
      ]
    }
  }
])

This results in no error, and no updates to the "another" collection. Is this expected? I assumed this would not be allowed as there is a restriction on the "whenMatched" pipeline not updating the "on" field.

Another example, using a different "on" field:

> db.second.find()
{ "_id" : ObjectId("60dcae7e4a59a35f0ed6b34b"), "item" : "almonds" }
 
 
> db.another.find()
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }
{ "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }
{ "_id" : 3 }
{ "_id" : 4, "item" : "ginger" }

Pipeline:

db.second.aggregate([
  {
    $merge: {
      "into": "another",
      "whenMatched": [
        {
          "$project": {
            "item": 0
          }
        }
      ]
    }
  }
])
uncaught exception: Error: command failed: {
        "ok" : 0,
        "errmsg" : "E11000 duplicate key error collection: playground.another index: item_1 dup key: { item: \"almonds\" }",
        "code" : 11000,
        "codeName" : "DuplicateKey",
        "keyPattern" : {
                "item" : 1
        },
        "keyValue" : {
                "item" : "almonds"
        }
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:18:14
_assertCommandWorked@src/mongo/shell/assert.js:618:17
assert.commandWorked@src/mongo/shell/assert.js:708:16
DB.prototype._runAggregate@src/mongo/shell/db.js:266:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1046:12
@(shell):1:1

It seems that in this case I run into a duplicate key error as we attempt to insert a results document. What is the correct behavior in this case?



 Comments   
Comment by Asya Kamsky [ 03/Sep/21 ]

Second example is invalid because you didn't specify `on:["item"]` if you do then it updates and unsets the `item` field.

Without it it's merging on `_id` so it's doing an insert.

 

Comment by Asya Kamsky [ 03/Sep/21 ]

To rephrase:

 

 db.another.update({},[{$unset:"_id"}]) 

should error just like a regular update does:

 
db.coll.update({},{$unset:{"_id":1}})
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0, "writeError" : { "code" : 66, "errmsg" : "Performing an update on the path '_id' would modify the immutable field '_id'" }}) 

Comment by Asya Kamsky [ 30/Jun/21 ]

> This results in no error, and no updates to the "another" collection. Is this expected?

Oplog examination shows that it does update another collection but it's a replacement with existing document i.e. no-op except for oplog entry...

Comment by Asya Kamsky [ 30/Jun/21 ]

Note that this seems to be related to the fact that pipeline update which tries to change `_id` doesn't give an error:

db.another.update({},[{$unset:"_id"}])
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Examining the oplog entry shows it did a replace of full document

{ "op" : "u", "ns" : "test.another", "ui" : UUID("5f36fb78-396e-47c0-88eb-b3c1195c3cef"), "o" : { "_id" : 1, "item" : "almonds", "price" : 99, "quantity" : 12 }, "o2" : { "_id" : 1 }, "ts" : Timestamp(1625081088, 1), "t" : NumberLong(352), "wall" : ISODate("2021-06-30T19:24:48.978Z"), "v" : NumberLong(2) }

Generated at Thu Feb 08 05:43:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.