[SERVER-30320] Feature Request: Merge Created: 25/Jul/17  Updated: 29/Jan/18  Resolved: 26/Jul/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Joshua Austill Assignee: Asya Kamsky
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-12280 allow $out to append to/merge with ex... Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

If this feature request already exists I couldn't find it, so just point me to that so I can follow it

I had to switch from mongodb to mysql at work this week because I couldn't find a way to merge collections. If this exists, I sure couldn't find it.

Basically, the request is to be able to merge one collection into another. With logic for matching documents that looks like $lookup, and different updates for on match, on missing on source, and on missing on target. I imagine the syntax looking something like this

db.inventory.merge({from: "inventory_temp",
          localField: "sku",
          foreignField: "sku",
          as: "inventory_docs"},
 {status: "inventory_docs.status"},
{$unset: true},
{name: "inventory_docs.name", status: "inventory_docs.status"});

The first paramenter being the $lookup criteria that must result in a 1:1 match or no match. The second being the update if matched. The third being the update if the document is missing on the source, in this case delete it on the target. And lastly the not matched on the target specifies an insert.

I'm keeping this short in case this request already exists, if it doesn't I can certainly flush it out



 Comments   
Comment by Joshua Austill [ 31/Jul/17 ]

Hey Asya,

After reviewing this I believe you are correct, SERVER-12280 would solve my challenge indeed. I will follow it and upvote.

Also, after going through this exercise with you, it hit me that my biggest challenge is that I was approaching the issue wrong. I came from a very long RDBMS background so I was thinking "merge" and not "aggregate". I think my recommendation would be to add documentation that specifically addresses this topic so other people like myself find it right away. Here's the documentation that people like me would want to be able to find. https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql It would be neat to see a side by side SQL merge looks like this in MongoDB doc page just like you already have for basic crud operations like find, update, delete, etc.

Lastly, for documentation sake, the reason for the delete is pretty straight forward. In this case, my MongoDB collection wouldn't have been the SOR. The SOR would have been a VERY old core retail management system that we have read only access to. So whenever they deleted an SKU in that old system, we'd need to remove it from the MongoDB collection. However, the only way to know it was deleted would be to aggregate both sides and see what was missing in this case because the core system provides no updates, they just go missing.

Thanks so much and have a great week!
Joshua

Comment by Asya Kamsky [ 26/Jul/17 ]

Thanks for clarification. I'm going to mark this ticket as a duplicate of SERVER-12280 because I believe when that ticket is resolved you will be able to merge output from your temp collection into the inventory collection.

I'm not sure I understand the purpose of removing the record that doesn't have a match, but if you want those semantics you can already achieve that today:

// version 3.5.10
db.inventory_temp.aggregate([
          {$lookup:{from:"inventory", localField:"sku",foreignField:"sku", as:"curr"}},
          {$unwind:{path:"$curr",preserveNullAndEmptyArrays:true}},
          {$replaceRoot:{newRoot:{$mergeObjects:["$curr","$$ROOT"]}}},
          {$project:{curr:0}},
          {$out:"inventory"}
])
db.inventory.find()
{ "_id" : ObjectId("5978a33cb6a1aa8d363bf012"), "sku" : 1234, "name" : "phone", "status" : "active", "description" : "a cell phone" }
{ "_id" : ObjectId("5978a354b6a1aa8d363bf015"), "sku" : 3456, "name" : "case", "status" : "active" }

$mergeObjects is new in 3.5 but you can do the same thing in versions 3.4.4 and later using $objectToArray, $concatArrays, and $arrayToObject expressions.

// version 3.4.6
db.inventory_temp.aggregate([
     {$lookup:{from:"inventory", localField:"sku",foreignField:"sku", as:"curr"}},
     {$unwind:{path:"$curr",preserveNullAndEmptyArrays:true}},
     {$replaceRoot:{newRoot:{$arrayToObject:{$concatArrays:[{$objectToArray:"$$ROOT"},{$objectToArray:{$ifNull:["$curr",{}]}} ]}}}}, 
     {$project:{curr:0}},
     {$out:"inventory"}
])

Thanks for your suggestion and please watch and upvote SERVER-12280.

Asya Kamsky

Comment by Joshua Austill [ 25/Jul/17 ]

Absolutely!

In this example, inventory would be the target, and inventory_temp would be the source.
db.inventory.merge from inventory_temp reads that way kinda.

Let's say the inventory collection had two documents like

{ sku: 1234, name: "phone", description: "a cell phone", status: "active" } { sku: 2345, name: "case", status: "active" }

Then the inventory_temp table had two documents like

{ sku: 1234, name: "phone", status: "removed" } { sku: 3456, name: "case", status: "active" }

So after the $lookup sku 1234 would match 1234, but 2345 wouldn't match from the source, and 3456 wouldn't match from the target.

The 3 statements would say first update 1234 in the target since it matched the source. Then delete 2346 since it didn't match in the source, and insert 3456 because it didn't match on the target. When you were done the inventory collection would look like this

{ sku: 1234, name: "phone", description: "a cell phone", status: "removed" } { sku: 3456, name: "case", status: "active" }

Merging from another source the delete may have been BAD, so $unset: false would be an ideal default for this.

The use cases for this are pretty straight forward. In my case this week, I had millions and millions of documents to update and using a bulk operation I ran out of memory on my client. So I either had to break it up into smaller chunks, or bulk load the source data to a temp table and then update the target from that source. In this case I can do this in mysql in about 10 minutes and my boss knows it, so he made me switch instead of spending the time writing code to break this up into smaller chunks I think this kind of merge feature would be an amazing thing in mongodb, and really put it over the top for this kind of bulk stuff. But I also realize this isn't the kind of bulk operations that mongodb was originally intended for and this may be a very hard ask

Comment by Asya Kamsky [ 25/Jul/17 ]

jlaustill@gmail.com you have multiple collections mentioned: "inventory", "inventory_temp" - what are you calling your source and what's your target?

Can you give a small example with data?

Generated at Thu Feb 08 04:23:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.