[SERVER-42137] Allow aggregation $merge stage to write to a collection that the query also reads from Created: 10/Jul/19  Updated: 29/Oct/23  Resolved: 20/Nov/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.2.0-rc2
Fix Version/s: 4.3.2

Type: Improvement Priority: Major - P3
Reporter: Clare Scally Assignee: Mihai Andrei
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-13236 Investigate changes in SERVER-42137: ... Closed
Related
is related to SERVER-37378 Prevent $out to the same namespace as... Closed
is related to SERVER-38360 Disallow pipelines which read from th... Closed
is related to SERVER-60788 merge_causes_infinite_loop.js attempt... Closed
Backwards Compatibility: Minor Change
Sprint: Query 2019-11-18, Query 2019-12-02
Participants:
Case:

 Description   

Enable functionality to permit $out to merge/append to the aggregation source collection.



 Comments   
Comment by Githook User [ 20/Nov/19 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com'}

Message: SERVER-42137 Allow aggregation stage to write to a collection that the query also reads from
Branch: master
https://github.com/mongodb/mongo/commit/b4f98455b02fa64dd23be3512ef83649a5395b76

Comment by Guy Harrison [ 16/Jul/19 ]

Yes, we duplicate multiple documents but only for a specific key value 

The prototoype pipeline we used in 4.1 went like this:

 

$match : select the documents to be cloned using a key value
$project: Remove the existing  key
$addFields: Add the new key 
$out: append the new documents to the same collection

 

4.1 allowed this behavior with mode:'insertDocuments'. 

Comment by Asya Kamsky [ 15/Jul/19 ]

guy@southbanksoftware.com I'm trying to understand the use case - you say you are modifying the documents into the same collection, but yet you're using insert mode, so when you say you are cloning the document, does this mean when you start with ten documents, you expect to end up with twenty?

How do you generate a new unique "merge on" field?

Comment by Guy Harrison [ 14/Jul/19 ]

We have a use case in which we are cloning a set of documents into the same collection (with some modifications).  We were enthusiastically awaiting the enhancements to $out that appeared in 4.1 which allowed us to do this cloning on the server side without a network round trip.   It was disappointing to see when 4.2 arrived that the capability which existed in 4.1 had been removed. 

I do understand how in MongoDB there is a chance of an infinite loop if the $merge stage generates documents that might be read by the $match clause if there is one or even worse if there is no $match. 

In our use case, the $match clause specifically selects documents that cannot match those output by the pipeline - there's a $project and $addfields that modifies the output to prevent that from occurring. 

The functionality is essentially INSERT INTO t1 SELECT * FROM t1.  In a relational database, a query snapshot would prevent the SELECT from reading anything created by the INSERT.  

So,  we would really like to have this capability.  Given what I understand of MongoDB architecture, it would seem that the easiest way to do this would be to append the output to a temporary collection until all earlier stages of the pipeline are complete and then append that output to the source collection. 

 

Generated at Thu Feb 08 04:59:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.