[SERVER-3055] MapReduce Performance very slow compared to Hadoop Created: 06/May/11 Updated: 29/Feb/12 Resolved: 02/Dec/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript |
| Affects Version/s: | 1.8.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jim Olson | Assignee: | Antoine Girbal |
| Resolution: | Duplicate | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Participants: |
| Description |
|
I have run into a dilemma with MongoDB. We have been performing Basically we have a couple questions: Is there any alternative to using JavaScript for the Map and Reduce functions from the Java API? We think that the JavaScript may be slowing things down a lot. It just seems that this should execute a lot faster. Thank you for any help, Kyle Banker's response to this was: "These results aren't surprising. You're right that the JavaScript MongoDB 2.0 will have a different, improved aggregation framework that So this is my JIRA ticket. |
| Comments |
| Comment by Antoine Girbal [ 02/Dec/11 ] |
|
the speed will be improved by the switch to v8 (should be 2-3x faster), so marking as duplicate. Please reopen if more questions or post on mongodb-user group for troubleshooting MR. |
| Comment by Antoine Girbal [ 12/Oct/11 ] |
|
Jim,
|
| Comment by Jim Olson [ 10/May/11 ] |
|
I got the bigger job to complete by adding a line of JavaScript to the Map function to only emit if the word length is greater than 1. The job then completed in 20.3 minutes. But why were the exceptions occurring above? It was counting an additional 37 words (a-z, 0-9, and the empty string) and the empty string had about 13 M occurrences. |
| Comment by Jim Olson [ 09/May/11 ] |
|
The same abend happened the 2nd time. The 261277 is not the number of distinct words (it is 167258) – I don't know what the 261277 represents. I can see from the first (successful shorter) job that the maximum datum was for an empty string which had a count of 13205099 which when multiplied by 10 (for the big set) would yield 132050990 or 7DEF02E hex. I was thinking the count might have exceeded the range of an integer but it shouldn't. |
| Comment by Jim Olson [ 09/May/11 ] |
|
The big job abended with an odd error: "} What is confusing to me is the decimal and hex sizes differ. The 261677 is the number of distinct words in the data set. The data is a 974 MB collection of email texts spread over 3 sharded servers. |
| Comment by Eliot Horowitz (Inactive) [ 09/May/11 ] |
|
Should also be a lot faster in 2.0, so should see what happens there. |
| Comment by Jim Olson [ 09/May/11 ] |
|
Eliot, thanks. I tried this and it works. It's about 50% faster. |
| Comment by Eliot Horowitz (Inactive) [ 09/May/11 ] |
|
It looks like you basically wrote your own map/reduce engine inside of the map/reduce engine. Try this function () { ; The reduce function is: function(key, values) { |
| Comment by Jim Olson [ 09/May/11 ] |
|
Eliot, function () { ; The reduce function is: function(key, values) { } The rest is just a simple java program that creates a MapReduceCommand object on Hope this helps. I looked it over and I don't think there are any typos. Regards, |
| Comment by Eliot Horowitz (Inactive) [ 09/May/11 ] |
|
First, one option is to use hadoop for processing with the data input and output in mongo. 2nd, can you send the code your'e using? Also, the new aggregration framework might make things much faster. |