[SERVER-12280] allow $out to append to/merge with existing collection instead of replacing it Created: 07/Jan/14  Updated: 10/Jul/19  Resolved: 07/Aug/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.1.2

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Nicholas Zolnierz
Resolution: Done Votes: 40
Labels: usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-35896 Support "replaceDocuments" mode in $out Closed
depends on SERVER-35897 Support dropTarget false in $out Closed
is depended on by SERVER-18027 Support aggregation $out to sharded c... Closed
is depended on by SERVER-544 Merge/overwrite collections function Closed
is depended on by SERVER-6025 Similar mysql syntax "INSERT ... SELE... Closed
Duplicate
is duplicated by SERVER-29119 Aggregation: Allow $out the ability t... Closed
is duplicated by SERVER-25886 allow upsert option via aggregation $... Closed
is duplicated by SERVER-30320 Feature Request: Merge Closed
Related
is related to SERVER-12342 Add dropTarget option $out operator Backlog
Backwards Compatibility: Fully Compatible
Sprint: Query 2018-08-13
Participants:
Case:

 Description   

Currently specifying $out with existing collection name will replace that collection - there are cases where we might want to append to it or merge with it.

This is a request for such an option.



 Comments   
Comment by aniket patil [ 04/Sep/18 ]

Thanks Kelsey .

Comment by Kelsey Schubert [ 21/Aug/18 ]

Hi aniket@vrhythmstech.com,

You can download MongoDB 4.1.2, from our download center and selecting development releases. As a reminder, 4.1.2 is a development release and is not intended for production use, but is a great way to try out new features. The 4.1 series will evolve into 4.2, which will be for production.

Kind regards,
Kelsey

Comment by aniket patil [ 20/Aug/18 ]

Thank you guys.. From Where should I download 4.1.2 version. 

Comment by Juan Antonio Roy Couto [ 07/Aug/18 ]

Thank you guys, very good news!!! Great job!

Comment by Naidu S [ 07/Aug/18 ]

WOW.....Thank you. 

Comment by Nicholas Zolnierz [ 06/Aug/18 ]

Hi All -

We've modified the $out aggregation stage to accept the following format:

$out: {
    mode: "replaceCollection" | "replaceDocuments" | "insertDocuments",
    to: "targetCollection",
    uniqueKey: {<set of fields that uniquely identify the document>},
}

The mode "replaceCollection" behaves similar to the current syntax, dropping the target collection and replacing its contents with the result of the aggregation. The two new modes will attempt to append the document ("insertDocuments") or replace the document if it exists ("replaceDocuments"). Note that "replaceDocuments" will insert the document if one doesn't exist that matches the uniqueKey.

Although there are some limitations, sample usage of the new modes can be found here and here. Please consult the documentation once the feature is released if you have any questions.

Regards,
Nick

Comment by brezniczky [ 02/Jan/18 ]

+1
(sorry I just spotted the "Vote for this ..." in the top right - will be less commentful next time!)

Comment by Naidu S [ 03/Oct/17 ]

Hello Team,

+1, Please Please Please add this feature to V3.6 release. Thank you.

Comment by Joshua Austill [ 31/Jul/17 ]

Hello all,

I ended up here after trying to do a bulk insert and merge into an existing collection. In Microsoft SQL Server this is a common pattern using the merge feature, documented here.

I think this feature would add a very similar set of features to MongoDB and would be a huge addition. Considering that this is functionality that both MySQL and PostgreSQL don't have to the best of my knowledge. Once this feature becomes a reality what I'd really like to see is a cross reference to the link above side by side. With this merge statement in SQL looks like this in mongodb. Just like the side by side examples for CRUD here.

merge in MSSQL is a very complex and complete feature, so I wouldn't expect every possibility to be possible, but the basic update/insert/delete features would be amazing. Even if it required two aggregations where one did an upsert and the other did a delete I think that would be kinda huge.

Just my two cents, thanks for working on this y'all!

Comment by Zohar Bar-Yehuda [ 27/Jul/17 ]

+1 for this really missing option.
We have a huge data structure which is split to monthly collections (sort of partitions since these are not supported in MongoDB), and we'd like to have multiple aggregations on these multiple collections which would append their results to a single collection. Furthermore, to improve speed, we even split heavy aggregations on each single collection to several slices (by the index range) to achieve parallelization at the DB level - performing each sub-aggregation on a different thread (also overcoming disk IO bottlenecks on the way).

We could REALLY use an option which would append all the results to a single collection (without having to move all the data between the DB and the application twice). That's because our next stage after this are other aggregations which would ideally be performed on this "intermediate" collection.

Comment by Juan Antonio Roy Couto [ 18/Apr/17 ]

Hi @Asya, I am not sure whether what I am going to say makes sense or not, but it could be a possibility. When we run an aggregation query using the $out stage we have three cases:
1 - We can project the original _id of the document to the new collection
2 - We can not project the original _id of the document to the new collection
3 - We can create a new document that does not exist in the original collections, so we do not have an original _id
Just now, if I am not wrong, in case 1 MongoDB uses the original _id and in cases 2 and 3 MongoDB calculates a new _id, right?
What I am thinking about is, when the user chooses the add option and not the replace option, to have two _id fields in the new collection, '_id' (always calculated by MongoDB) and 'original_id', so:
Case 1: original_id = value of _id field in the original collection
Cases 2 and 3: original_id = null
In respect to other unique fields, replace the document or skip it could be an option for the user.
Thanks

Comment by Dov Rosenberg [ 02/Apr/17 ]

@Asya Kamsky - there could be an option to determine, but I'd think by default it should upsert, so existing records would have the attributes in the $out added to them (or replaced if already present).

Comment by Asya Kamsky [ 02/Apr/17 ]

If appending to existing collection is allowed, what should happen to existing records when the "_id" value is the same?

Comment by Juan Antonio Roy Couto [ 10/Mar/17 ]

Hello. Another use case where append documents to an existing collection via $out would be useful is the periodic feed of statistics data from other collections. Thanks!

Comment by Michael Latta [ 16/Nov/16 ]

This is critical for performance of our product in several use cases. Having to round trip to the app server is a waste of bandwidth and time.

Comment by NOVALUE Mitar [ 13/Nov/15 ]

That is especially problem since copyTo has been deprecated. So one cannot use the workaround of outputting to a temporary collection and then copying it over to a wanted one.

Comment by Aiman Abedi [ 17/Sep/14 ]

This feature would be great to have. We love the performance of the aggregation framework we have been experiencing and it's really a bummer to have to move back to map-reduce where the merge option is already supported but the performance are nowhere close to this framework for our case. Looking forward to seeing this released!

Comment by Asya Kamsky [ 14/May/14 ]

Agreed that merge or append would be useful options but since conflicts are possible (with _id or other unique or otherwise constrained fields) there have to be options or clear defaults, like replace duplicate _id docs or skip if constraint violated (similar to mongorestore though that one has flag to drop and adds by default).

Comment by Gary Murakami [ 14/May/14 ]

Here's an example use case: https://github.com/gjmurakami-10gen/mongo-musicbrainz/blob/master/script/merge_agg.rb

This program denormalizes collections common to the Active Record / Object Relational Mapping technique.
It merges one-one and one-many child relations into a parent using the aggregation framework, each child requires an aggregation pipeline, and currently server-client round trip to append to a collection. With an append capability, all but one round-trip overhead would go away. The remaining round-trip could be eliminated if we had a way to remove fields with null values via aggregation.

Generated at Thu Feb 08 03:28:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.