[SERVER-970] MapReduce:add ability to process many collections at once Created: 05/Apr/10 Updated: 30/Jun/15 Resolved: 30/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Tools |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor - P4 |
| Reporter: | César D. Rodas | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 6 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
MapReduce would be more useful if it supports many collections as data source instead of just one. This feature would turn MongoDB into a full replacement for Hadoop in sharded environments, and will add capabilities to process largest amount of data. I think it would be even more useful if MapReduce can process collections from different databases as well, because the number of collections in a databases is limited by default. |
| Comments |
| Comment by Ian Whalen (Inactive) [ 30/Jun/15 ] |
|
Hi Cesar, thanks a lot for filing this feature request and my apologies for the time since it was last updated. I'm glad, however, that the changes made in 1.7.4 addressed your use case. With regards to your feature request as a whole, I'm going to close this as a Duplicate and link it to our upcoming $lookup feature - after careful consideration we’ve decided to provide users with the desired functionality via our aggregation pipeline. Please follow along in |
| Comment by César D. Rodas [ 22/Feb/11 ] |
|
Michael, Thanks for the info, it looks pretty useful for my usage case! Cheers, |
| Comment by Michael Shapiro [ 22/Feb/11 ] |
|
César, Check out what they've added in >1.7.4, might be what you're looking for. http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Outputoptions |
| Comment by Michael Shapiro [ 29/Apr/10 ] |
|
I haven't put any proper thought into it, but at worst you could probably create a temporary collection and dump the objects from your target collections into it (possibly with new keys on each object to indicate what collection it came from, if that matters), then just run the MR job against the temp collection. It's a pretty terrible solution, though. |
| Comment by César D. Rodas [ 29/Apr/10 ] |
|
Michael, Really? How? I've been thinking about it and I couldn't find a way. Regards, |
| Comment by Michael Shapiro [ 29/Apr/10 ] |
|
I think it'd be a pretty cool feature too. Perhaps for now, you can fake it with db.eval()? |