[SERVER-9907] Allow to skip initial count() in mapreduce Created: 11/Jun/13  Updated: 11/Jul/16  Resolved: 27/Aug/13

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: 2.4.7, 2.5.2

Type: Improvement Priority: Major - P3
Reporter: Antoine Girbal Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-12710 Map-Reduce reports incorrect stats in... Closed
Participants:

 Description   
MongoDB Status as of October 9th, 2013

ISSUE SUMMARY
In order to report progress of ongoing mapReduce jobs, the filter query used for the input documents to the mapReduce job is run to get the total count of documents affected. For long running queries, this extra logging information is very costly to overall mapReduce run time.

USER IMPACT
This fix is a performance improvement only. There is a change in the log messages reported in the log during a mapReduce in the case that a filter is used. Instead of outputting "percentage complete," a running count of documents processed is reported.

SOLUTION
The issue has been resolved by only using the total count of documents in the ProgressMeter in the case that there is no query filter used.

WORKAROUNDS
There is no workaround.

PATCHES
Production release v2.4.7 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

A significant portion of the map reduce job may be spent actually matching the input documents.
Right now we do an initial count() (line 594 mr.cpp) in order to display the progress meter.

In my production example, about 90% of the time is spent matching the input documents (no ideal way to index further) and consequently the initial count() waste takes half of the entire job time.

Either:

  • remove the initial count() and progress meters should just display how many haven been done instead of % of completion
  • add an option like "in.showProgress: false" to disable the count().

This map reduce application will have to ingest a large volume of data, and the matching rules are pretty complex, so having that option may save up to 50% of MR execution time.



 Comments   
Comment by auto [ 02/Oct/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-9907 Allow to skip initial count() in mapreduce

Do not count total documents to process if filter is given.
Branch: v2.4
https://github.com/mongodb/mongo/commit/2711fa56d83006ddf91978f8e749074769cc121a

Comment by auto [ 02/Oct/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-9907 Allow to skip initial count() in mapreduce

Added option to hide the total in the progress meter.

Conflicts:
src/mongo/util/progress_meter.cpp
Branch: v2.4
https://github.com/mongodb/mongo/commit/eb13f30cec8e9fcecddc0e91e7db85ba033e1415

Comment by auto [ 30/Jul/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-9907 Allow to skip initial count() in mapreduce

Do not count total documents to process if filter is given.
Branch: master
https://github.com/mongodb/mongo/commit/9f6cf548d5fe19daf4478e5ffd4072a3993302e0

Comment by auto [ 30/Jul/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-9907 Allow to skip initial count() in mapreduce

Added option to hide the total in the progress meter.
Branch: master
https://github.com/mongodb/mongo/commit/0bc4c30550668d547889b80af209f7623e72a1a9

Comment by Daniel Pasette (Inactive) [ 12/Jun/13 ]

Should remove the count in the case that there is a filter. Need to figure out how this will work with the progress meter/logging.

Generated at Thu Feb 08 03:21:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.