[SERVER-1197] Performance question regarding map/reduce: map reduce mongo fonction slower than naive (python) counting Created: 06/Jun/10 Updated: 14/Dec/11 Resolved: 07/Jun/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 1.5.2 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Alan | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
GNU/Linux Ubuntu (Lucid) mongodb-unstable package (version 20100604) |
||
| Attachments: |
|
| Participants: |
| Description |
|
We coded the given map/reduce example (http://api.mongodb.org/python/current/examples/map_reduce.html) directly in python and got much better performance (see attached script) ... did we get anything wrong? Usage of the script (will create sample data and time the two methods): python mongo_map_reduce_counter.py test_db_name Example output (with nb_objects=5000, nb_tags=200, nb_bins=3): |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 14/Dec/11 ] |
|
If you want to diagnose why your case is slow - can you open a new ticket with the map/reducde code and sample data. |
| Comment by Stephen Nelson [ 14/Dec/11 ] |
|
http://shootout.alioth.debian.org/u32/javascript.php Nice try, but javascript (v8) is only 3x slower for the type of operations I'm performing. Mongo specifically is adding truly massive overhead to map/reduce operations. The new aggregation framework is not adequate for the things I'm doing (and I'm sure many other users) - my map functions need to make decisions about documents based on non-trivial dependent properties. The aggregation framework will not be able to replace m/r; it's not sufficient, as you seem to be aware from your comments on the linked issue. Has anyone profiled mongo's map/reduce implementation to determine where the overhead is coming from? |
| Comment by Eliot Horowitz (Inactive) [ 14/Dec/11 ] |
|
Javascript is much slower than java. |
| Comment by Stephen Nelson [ 13/Dec/11 ] |
|
Why was this issue closed without the problem being corrected? I'm using mongo (version 2.0) for my dissertation research. A mongodb map/reduce function runs between 10 and 100 times slower than a Java implementation which does the same thing. To test this I hand-coded a naive java implementation of map/reduce which maps by iterating over a collection, performing the map operation, then storing any emits in a temporary collection. I create an index for the temporary collection, then call reduce which iterates over the temporary collection finding keys, retrieves all entries for that key in batches, stores the result in a new table, and deletes all entries for that key before moving on. When the temporary collection is empty I'm done. This naive approach took my map/reduce function operating on several hundred million documents from many days down to hours. I've since written an implementation which uses an cache in java and sequential traversal of the temporary collection without deletes which takes it down by another factor of 10. Why is mongo's implementation so freaken slow? Are you loading an entirely new javascript VM for every application of map? Map/reduce's performance is completely at odds with the excellent performance of everything else. |
| Comment by Alan [ 08/Jun/10 ] |
|
Thanks for your answer. In the particular (and very simple cf. the source code) example the speed difference is really great! I guess we will stick to the "naive" python implementation for now |
| Comment by Eliot Horowitz (Inactive) [ 07/Jun/10 ] |
|
python may be faster than map/reduce for some cases. |