Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.4.7, 2.5.2
Affects Version/s: None
Component/s: MapReduce
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

MongoDB Status as of October 9th, 2013

ISSUE SUMMARY
In order to report progress of ongoing mapReduce jobs, the filter query used for the input documents to the mapReduce job is run to get the total count of documents affected. For long running queries, this extra logging information is very costly to overall mapReduce run time.

USER IMPACT
This fix is a performance improvement only. There is a change in the log messages reported in the log during a mapReduce in the case that a filter is used. Instead of outputting "percentage complete," a running count of documents processed is reported.

SOLUTION
The issue has been resolved by only using the total count of documents in the ProgressMeter in the case that there is no query filter used.

WORKAROUNDS
There is no workaround.

PATCHES
Production release v2.4.7 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

A significant portion of the map reduce job may be spent actually matching the input documents.
Right now we do an initial count() (line 594 mr.cpp) in order to display the progress meter.

In my production example, about 90% of the time is spent matching the input documents (no ideal way to index further) and consequently the initial count() waste takes half of the entire job time.

Either:

remove the initial count() and progress meters should just display how many haven been done instead of % of completion
add an option like "in.showProgress: false" to disable the count().

This map reduce application will have to ingest a large volume of data, and the matching rules are pretty complex, so having that option may save up to 50% of MR execution time.

is related to

SERVER-12710 Map-Reduce reports incorrect stats in db.currentOp

Closed

Assignee:: Randolph Tan
Reporter:: Antoine Girbal (Inactive)
Participants:: Antoine Girbal, auto, Daniel Pasette, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jun 11 2013 10:09:18 PM UTC
Updated:: Jul 11 2016 05:39:16 PM UTC
Resolved:: Aug 27 2013 02:07:29 PM UTC

Details

Description

Original Description

Attachments

Issue Links

Forms

Activity

People

Dates