[SERVER-6869] MapReduce with MongoDB really, really slow (30 hours vs 20 minutes in MySQL for a equivalent database) Created: 27/Aug/12 Updated: 15/Feb/13 Resolved: 27/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.2.0-rc0 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | José Manuel Ciges Regueiro | Assignee: | Ben Becker |
| Resolution: | Done | Votes: | 0 |
| Labels: | mapreduce | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: Suse Linux Enterprise Server 10 (Virtual Server on Xen) |
||
| Participants: |
| Description |
|
I am doing now some data analyze tests and in the first, really simple I have got very strange results. The idea is the following: from an internet access log (a collection with a document for each access, for the tests 90 millions of documents). I want to get the number of access by domain (what will be a GROUP BY in MySQL), and get the 10 most accessed domains The script I have made in JavaScript is really simple :
The equivalent in MySQL is :
Well, MongoDB takes 30 hours to get the results and MySQL 20 minutes! After reading a little I have arrived to the conclusion that for data analyze we will have to use Hadoop as MongoDB is really slow. What am I doing wrong? Are this results normal? Should I use Hadoop? |
| Comments |
| Comment by Ben Becker [ 27/Aug/12 ] | |||||||||
|
Hi Jose, The aggregate command does (roughly) the equivalent of what MySQL does when running a query like this. MapReduce requires processing every document through Javascript. At this time the default javascript engine (SpiderMonkey) is single-threaded, however we're planning to replace the default engine a faster, multithreaded one (v8) in version 2.3/2.4. You can already build v2.0/v2.2 with v8 by passing the --usev8 flag if you would like to test. It sounds like the server was processing roughly 833 documents/second, which is a bit slow, but not entirely unexpected. I'm going to go ahead and close this issue out, but please feel free to reopen if you have further questions or issues. Also, the mongodb-user group and IRC are good places to get quick feedback if you run into questions/issues like this. Best Regards, | |||||||||
| Comment by José Manuel Ciges Regueiro [ 27/Aug/12 ] | |||||||||
|
Apparently using the group function on Aggregation Framework works well! The following Javascript code gets the 10 most visited domains with their visits in 17m17s!
Anyway I still don't understand why the MapReduce alternative is so slow. Is this a bug or is the way it works by now? |