-
Type:
Improvement
-
Resolution: Duplicate
-
Priority:
Major - P3
-
None
-
Affects Version/s: 1.8.0
-
Component/s: JavaScript
-
None
-
Environment:Linux
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
I have run into a dilemma with MongoDB. We have been performing
some MapReduce benchmarks against Hadoop and have found MongoDB
to be a lot slower than Hadoop (65 minutes vs 2 minutes for a CPU-intensive
MapReduce job that basically breaks up strings and computes word counts
on large number of email texts (about 974 MB worth). I sharded the collection
across 3 servers and verified that it did get evenly distributed after using
db.printShardingStatus(); there are 7/8/7 chunks on the 3 shards.
And the collection is indexed.
Basically we have a couple questions:
Is there any alternative to using JavaScript for the Map and Reduce functions from the Java API? We think that the JavaScript may be slowing things down a lot.
Are there other overhead threads running that can be or should be disabled to speed up the MapReduce performance?
It just seems that this should execute a lot faster.
Thank you for any help,
Jim Olson
Kyle Banker's response to this was:
"These results aren't surprising. You're right that the JavaScript
engine is slow (and single-threaded). We're upgrading to V8, which may
help somewhat, but it still won't be as fast as, say, Hadoop.
MongoDB 2.0 will have a different, improved aggregation framework that
doesn't use JS. That will greatly improve aggregation for a lot of use
cases. I'd recommend that you create a JIRA issue for this use case so
that we can track interest and make sure that the new framework can
support it."
So this is my JIRA ticket.
Please let me know if I can provide further details.
Thank you. jamesolson@noviidesign.com