[SERVER-1197] Performance question regarding map/reduce: map reduce mongo fonction slower than naive (python) counting Created: 06/Jun/10  Updated: 14/Dec/11  Resolved: 07/Jun/10

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 1.5.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Alan Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

GNU/Linux Ubuntu (Lucid) mongodb-unstable package (version 20100604)


Attachments: File mongo_map_reduce_counter.py    
Participants:

 Description   

We coded the given map/reduce example (http://api.mongodb.org/python/current/examples/map_reduce.html) directly in python and got much better performance (see attached script) ... did we get anything wrong?

Usage of the script (will create sample data and time the two methods):

python mongo_map_reduce_counter.py test_db_name

Example output (with nb_objects=5000, nb_tags=200, nb_bins=3):
$>python mongo_map_reduce_counter.py test_sdsd
calc naive time 0.317932844162
calc map_reduce time 110.605533838
Same results? True



 Comments   
Comment by Eliot Horowitz (Inactive) [ 14/Dec/11 ]

If you want to diagnose why your case is slow - can you open a new ticket with the map/reducde code and sample data.

Comment by Stephen Nelson [ 14/Dec/11 ]

http://shootout.alioth.debian.org/u32/javascript.php

Nice try, but javascript (v8) is only 3x slower for the type of operations I'm performing. Mongo specifically is adding truly massive overhead to map/reduce operations.

The new aggregation framework is not adequate for the things I'm doing (and I'm sure many other users) - my map functions need to make decisions about documents based on non-trivial dependent properties. The aggregation framework will not be able to replace m/r; it's not sufficient, as you seem to be aware from your comments on the linked issue.

Has anyone profiled mongo's map/reduce implementation to determine where the overhead is coming from?

Comment by Eliot Horowitz (Inactive) [ 14/Dec/11 ]

Javascript is much slower than java.
If it comes down to that - java will always win.
The new aggregation framework is the long term solution. SERVER-447

Comment by Stephen Nelson [ 13/Dec/11 ]

Why was this issue closed without the problem being corrected? I'm using mongo (version 2.0) for my dissertation research. A mongodb map/reduce function runs between 10 and 100 times slower than a Java implementation which does the same thing.

To test this I hand-coded a naive java implementation of map/reduce which maps by iterating over a collection, performing the map operation, then storing any emits in a temporary collection. I create an index for the temporary collection, then call reduce which iterates over the temporary collection finding keys, retrieves all entries for that key in batches, stores the result in a new table, and deletes all entries for that key before moving on. When the temporary collection is empty I'm done.

This naive approach took my map/reduce function operating on several hundred million documents from many days down to hours. I've since written an implementation which uses an cache in java and sequential traversal of the temporary collection without deletes which takes it down by another factor of 10.

Why is mongo's implementation so freaken slow? Are you loading an entirely new javascript VM for every application of map? Map/reduce's performance is completely at odds with the excellent performance of everything else.

Comment by Alan [ 08/Jun/10 ]

Thanks for your answer. In the particular (and very simple cf. the source code) example the speed difference is really great! I guess we will stick to the "naive" python implementation for now

Comment by Eliot Horowitz (Inactive) [ 07/Jun/10 ]

python may be faster than map/reduce for some cases.
we are going to be working on m/r performance later this year

Generated at Thu Feb 08 02:56:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.