[SERVER-2333] mapreduce optimization: do not execute reduce on unique keys Created: 05/Jan/11  Updated: 12/Jul/16  Resolved: 25/Jan/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 1.7.5

Type: Improvement Priority: Major - P3
Reporter: Antoine Girbal Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-10736 Modify MapReduce to "map, shuffle, re... Closed
Participants:

 Description   

By accident I had a wrong reduce method:
map = function()

{ emit(this.ln, 1); }

reduce = function(key, vals) { var sum = 0; for (var val in vals)

{ sum += val; }

return sum; }

The rows in collection actually have only 1 entry per this.ln.
If using inline the results look like:

{ "_id" : "zzucdarlws", "value" : 1 }

If output goes to collection, it's
> db.output.find(

{"_id": "zzucdarlws"}

)

{ "_id" : "zzucdarlws", "value" : "00" }

It looks like for inline, the reduce function is never called, whereas it's called once for the collection.



 Comments   
Comment by Brian Johnson [ 10/May/12 ]

I filed https://jira.mongodb.org/browse/SERVER-5818 because I think this "fix" only applies to a very limited set of use cases. For instance, it would be a problem if you were summing values per key. If you only had a single key, you wouldn't get an aggregated count.

Comment by auto [ 24/Jan/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-2333: added test
https://github.com/mongodb/mongo/commit/d45a6d5b1a6a9e8d0c8bc4b7bbe324f807e2357f

Comment by auto [ 24/Jan/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-2333: mapreduce: if results are inline vs into collection, it seems the execution may differ (different optimization?)
This was fixed, now reduce() isnt called for unique keys even when output to collection.

Added many comments all around mr.cpp
https://github.com/mongodb/mongo/commit/58bbafbc57df03bd4cf06a15fa61e57baca59a11

Comment by Antoine Girbal [ 24/Jan/11 ]

Fixed this by not applying reduce() in case there is only 1 object, even for output to collection.
This should make mr quite a bit faster in some cases, since no js will be called.
Also added many comments in mr.cpp

Generated at Thu Feb 08 02:59:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.