[SERVER-970] MapReduce:add ability to process many collections at once Created: 05/Apr/10  Updated: 30/Jun/15  Resolved: 30/Jun/15

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: César D. Rodas Assignee: Unassigned
Resolution: Duplicate Votes: 6
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-19095 $lookup Closed
Participants:

 Description   

MapReduce would be more useful if it supports many collections as data source instead of just one. This feature would turn MongoDB into a full replacement for Hadoop in sharded environments, and will add capabilities to process largest amount of data.

I think it would be even more useful if MapReduce can process collections from different databases as well, because the number of collections in a databases is limited by default.



 Comments   
Comment by Ian Whalen (Inactive) [ 30/Jun/15 ]

Hi Cesar, thanks a lot for filing this feature request and my apologies for the time since it was last updated. I'm glad, however, that the changes made in 1.7.4 addressed your use case.

With regards to your feature request as a whole, I'm going to close this as a Duplicate and link it to our upcoming $lookup feature - after careful consideration we’ve decided to provide users with the desired functionality via our aggregation pipeline. Please follow along in SERVER-19095 for further details of the $lookup implementation.

Comment by César D. Rodas [ 22/Feb/11 ]

Michael,

Thanks for the info, it looks pretty useful for my usage case!

Cheers,

Comment by Michael Shapiro [ 22/Feb/11 ]

César,

Check out what they've added in >1.7.4, might be what you're looking for.

http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions

Comment by Michael Shapiro [ 29/Apr/10 ]

I haven't put any proper thought into it, but at worst you could probably create a temporary collection and dump the objects from your target collections into it (possibly with new keys on each object to indicate what collection it came from, if that matters), then just run the MR job against the temp collection.

It's a pretty terrible solution, though.

Comment by César D. Rodas [ 29/Apr/10 ]

Michael,

Really? How? I've been thinking about it and I couldn't find a way.

Regards,

Comment by Michael Shapiro [ 29/Apr/10 ]

I think it'd be a pretty cool feature too.

Perhaps for now, you can fake it with db.eval()?

Generated at Thu Feb 08 02:55:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.