MapReduce Performance very slow compared to Hadoop

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: 1.8.0
    • Component/s: JavaScript
    • None
    • Environment:
      Linux
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I have run into a dilemma with MongoDB. We have been performing
      some MapReduce benchmarks against Hadoop and have found MongoDB
      to be a lot slower than Hadoop (65 minutes vs 2 minutes for a CPU-intensive
      MapReduce job that basically breaks up strings and computes word counts
      on large number of email texts (about 974 MB worth). I sharded the collection
      across 3 servers and verified that it did get evenly distributed after using
      db.printShardingStatus(); there are 7/8/7 chunks on the 3 shards.
      And the collection is indexed.

      Basically we have a couple questions:

      Is there any alternative to using JavaScript for the Map and Reduce functions from the Java API? We think that the JavaScript may be slowing things down a lot.
      Are there other overhead threads running that can be or should be disabled to speed up the MapReduce performance?

      It just seems that this should execute a lot faster.

      Thank you for any help,
      Jim Olson

      Kyle Banker's response to this was:

      "These results aren't surprising. You're right that the JavaScript
      engine is slow (and single-threaded). We're upgrading to V8, which may
      help somewhat, but it still won't be as fast as, say, Hadoop.

      MongoDB 2.0 will have a different, improved aggregation framework that
      doesn't use JS. That will greatly improve aggregation for a lot of use
      cases. I'd recommend that you create a JIRA issue for this use case so
      that we can track interest and make sure that the new framework can
      support it."

      So this is my JIRA ticket.
      Please let me know if I can provide further details.
      Thank you. jamesolson@noviidesign.com

              Assignee:
              Antoine Girbal (Inactive)
              Reporter:
              Jim Olson
              Votes:
              5 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: