[SERVER-5149] Auto-generated short-lived collections for output from $merge or $out Created: 29/Feb/12 Updated: 06/Dec/22 Resolved: 04/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, MapReduce |
| Affects Version/s: | 1.8.0, 2.0.0 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Andrew Morrow (Inactive) | Assignee: | Backlog - Query Execution |
| Resolution: | Done | Votes: | 1 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Participants: |
| Description |
|
It would be useful to have the ability to store results from your computations in the server for future inspection in a way that would go away after some period of time (perhaps when the session ends?) without having to choose unique names. Original DescriptionWith mongo 1.8 the ability to have a mapreduce write its results to a temporary table was removed, according to the documentation here: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions. This means that clients that want 'one shot' MR operations are forced to either use inline output, or to manage the naming, creation, and reaping of the output collection. If, as is very often the case, the total data size that will be created exceeds the current database maximum BSON object size (16MB these days), then you cannot use inline, and the only option is for the client to manually manage the lifetime of the output collection. However, this is not as easy as it sounds:
Overall, I'm perplexed why this very useful feature was removed. It takes away something that the server should be easily able to do with very high reliability, and forces clients to make complex (but ultimately hopeless) efforts to re-implement the feature themselves. |
| Comments |
| Comment by Esha Bhargava [ 04/Feb/22 ] |
|
Closing these tickets as part of the deprecation of mapReduce. |
| Comment by Pawel Terlecki [ 18/Jan/20 ] |
|
Support for temp collections is fundamental if one wants to Mongo to act as a node in any cross-database federated processing. For example, ETL in Alteryx, Tableau and basically any other cross-db processing engine is the only way to move piece of data across databases, e.g. for joins. Without this feature data from mongo will always need to be extracted fully for processing, even if most data in the scenario is in Mongo. This will happen in live federated models involving mongo and other databases in Tableau. Some of data blending scenarios will be completely unavailable. In addition, large filters are often externalized by temp collections for fast filtering. E.g. Tableau is extremely slow against data sources that do not support temp tables. In our case, we first need to fix our lookups with SBE to actually be faster in this scenario.
|
| Comment by Charlie Swanson [ 22/Aug/19 ] |
|
Converting this to a feature request since it's been long enough since that release that I wouldn't consider this a bug anymore. |
| Comment by Asya Kamsky [ 22/Mar/19 ] |
|
acm is this an issue? the way mr (and agg) output now it's written to a temp collection that's then renamed, so is that sufficient? I'm not sure what pre 1.8 behavior was. |
| Comment by Eliot Horowitz (Inactive) [ 01/Mar/12 ] |
|
The major problem was that the previous incarnation didn't make a lot of sense as designed. Agree it is a good feature, just not sure on all the design issues at this point. |