[SERVER-3253] aggregation: unsharded support $out Created: 13/Jun/11  Updated: 16/Nov/21  Resolved: 08/Aug/13

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 2.5.2

Type: New Feature Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Mathias Stearn
Resolution: Done Votes: 89
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-3627 sharded map-reduce output should be p... Closed
depends on SERVER-5932 support cursor based aggregation Closed
depends on SERVER-8850 $tee command for aggregation framework Closed
is depended on by CSHARP-800 Aggregate Command will support $out Closed
is depended on by DRIVERS-111 Support $out aggregation pipeline ope... Closed
is depended on by JAVA-912 Support $out aggregation pipeline ope... Closed
is depended on by SERVER-447 new aggregation framework Closed
is depended on by DOCS-1832 Document $out support Closed
Duplicate
is duplicated by SERVER-610 Temp Collections for Non Map-Reduce Q... Closed
Related
Participants:

 Description   

Implement output option for aggregation pipelines



 Comments   
Comment by auto [ 11/Oct/13 ]

Author:

{u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-3253 expand $out testing
Branch: master
https://github.com/mongodb/mongo/commit/6ba8fc2f0ee77cffd2270556afc5984e44c1b243

Comment by Daniel Pasette (Inactive) [ 03/Sep/13 ]

For those interested in experimenting with $out in the development release (v2.5.2), this feature is documented in the release notes: http://docs.mongodb.org/manual/release-notes/2.6/#aggregation-pipeline-changes

Comment by Shane R. Spencer [ 26/Aug/13 ]

I believe using a temporary collection and renaming it should be optional. I would love to use capped or TTL collections for aggregation results. I'm not sure if making this optional would also allow sharding to be done. I'm not too familiar with where code is processed when mongos is involved.

Comment by auto [ 19/Aug/13 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: Use bulk-insert in $out

About 4x faster at simple copy-collection benchmark.
Follow-up to SERVER-3253
Branch: master
https://github.com/mongodb/mongo/commit/f8e37b3b9daffbb721e231bc4418b6c7aac93e8d

Comment by auto [ 26/Jul/13 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-3253 Unsharded $out
Branch: master
https://github.com/mongodb/mongo/commit/ea776af8e3f982c488ca6d7476ab79e20733e9a9

Comment by Asya Kamsky [ 01/Jul/13 ]

breakphreak I suggest you ask about this on google-groups (mongodb-user) because something seems very strange if find takes minutes for query similar to what you are aggregating in seconds.

Comment by BreakPhreak [ 30/Jun/13 ]

+1
Getting results for the queries take way too long using regular 'find'. With aggregation it's rocket-fast (surprise!), but limited to the document size (16MB). BTW, I am getting this message in seconds, while using 'find' it takes minutes to get same 16MBytes of data. Hence, in a way, the feature is a lifesaver (or at least, of some busy waiting part of our lives). Thanks for considering.

Comment by Aliaksandr Rahalevich [ 26/Jun/13 ]

+1 for this feature. Will appreciate a lot.

Comment by Ken Williams [ 24/Jun/13 ]

+1. Count me as another vote for making this high-priority. Would be extremely helpful.

Comment by Bob Tiernay [ 15/May/13 ]

This is crucial for any sizeable collection. Otherwise it may require reissuing the aggregation pipeline multiple times in order to page through the results. This is terribly inefficient.

Is there an ETA on this feature?

+2

Comment by Keaton Adams [ 07/May/13 ]

So in our current SQL solution, we do a number of INSERT INTO <Summary>, SELECT <ID Column>, SUM(VAL1), SUM(VAL2) WHERE <Created_TS> >= 10 min ago;

Instead of a stored proc I would like to do this in a json script called via cron, but it sounds like I really need this $out function to keep things simple with directing the output from aggregate to another collection within the DB. If I understand this $out function properly, I really need this to mimic / duplicate what I can currently do in SQL.

Thanks for considering it for the next release.

Comment by Jordan Willis [ 06/May/13 ]

This would be huge for me and would really get the bioinformatics community switched over to mongodb for good. -J

Comment by Bastien Barre [ 15/Mar/13 ]

really important. We need it to play with a lot of data.

Comment by Arthur Nogueira Neves [ 14/Mar/13 ]

+1 .. really important one indeed

Comment by SaurabhSanthosh [ 10/Dec/12 ]

This is a very important feature which makes the aggregation framework really useful(especially when we are dealing with huge data sets). +1 for this one.

Comment by Shunsuke Mikami [ 15/Oct/12 ]

I agree with Andreas Petersson.
This issue is assigned to 2.3.x, but I wish that this issue assigned to 2.3.1 or 2.2.2.

Comment by Andreas Petersson [ 05/Oct/12 ]

+1 for this one. This is huge and I cant believe this is not voted higher.
Without this, the aggregaton framework cant really be used to its true potential.

Comment by Alex Piggott [ 14/Aug/12 ]

Any chance of being able to use the output of an aggregation to update existing collections in addition to merging/replacing?

I'm looking for a method of writing aggregations and other transforms back into existing collections without hitting the client or blocking the server - the "recommended" way of doing this is an unplanned improvement to update (https://jira.mongodb.org/browse/SERVER-458) that's been sat on the queue forever, currently I abuse a bug in map/reduce to perform the required functions (https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/AxXY7r6hHc8%5B1-25%5D).

Edit: I just noticed that a discussion about updating collections from the new aggregation framework is discussed in that link above: https://jira.mongodb.org/browse/SERVER-458 - semi-apologies for duplicating!

Comment by Chris Westin [ 15/Feb/12 ]

That's not decided yet. The goal is to make it as similar to M/R as it makes sense to do so. Some modes may not make sense.

Comment by Colin Mollenhour [ 14/Feb/12 ]

Will this have both replace and merge modes like M/R?

Comment by Chris Westin [ 16/Sep/11 ]

$out either has options, or supports $tee, which has the same options,
but also passes things along, unlike $out

  • options to have or not have indexes
  • might want to choose not to have an index if you write to a non-local or
    non-temp collection
  • when we do this, make index creation go after insertion into the collection,
    if the collection is new
Generated at Thu Feb 08 03:02:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.