[SERVER-4847] Wrong key returned from map/reduce Created: 02/Feb/12  Updated: 29/Feb/12  Resolved: 02/Feb/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Trivial - P5
Reporter: Jeff lee Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

It looks like there's an optimization for map reduce jobs that can result in the wrong key name being returned if there is only a single tuple to process. Notice how the keys for groups 1 and 5 are returned as "thetot" instead of "tot" in the job below.

I believe this is caused by line 103/124 of db/commands/mr.cpp. The workaround appears easy enough but I thought I'd report it.

Thanks

> db.goo.save(

{group:1, amt:5}

);
> db.goo.save(

{group:2, amt:3}

);
> db.goo.save(

{group:2, amt:15}

);
> db.goo.save(

{group:3, amt:1}

);
> db.goo.save(

{group:3, amt:2}

);
> db.goo.save(

{group:3, amt:9}

);
> db.goo.save(

{group:4, amt:1}

);
> db.goo.save(

{group:4, amt:3}

);
> db.goo.save(

{group:4, amt:10}

);
> db.goo.save(

{group:4, amt:12}

);
> db.goo.save(

{group:5, amt:2}

);
>
> map = function(){ emit( this.group,

{ cnt:1, thetot:this.amt }

); }
function () {
emit(this.group,

{cnt:1, thetot:this.amt}

);
}
>
> reduce = function(k, v){
... var r =

{ cnt:0, tot:0 }

;
... v.forEach(function(v)

{ ... r.cnt += v.cnt; ... r.tot += v.thetot ; ... }

)
... return r;
... }
function (k, v) {
var r =

{cnt:0, tot:0}

;
v.forEach(function (v)

{r.cnt += v.cnt;r.tot += v.thetot;}

);
return r;
}
>
> db.goo.mapReduce( map, reduce, { out:

{inline:1}

});
{
"results" : [
{
"_id" : 1,
"value" :

{ "cnt" : 1, "thetot" : 5 }

},
{
"_id" : 2,
"value" :

{ "cnt" : 2, "tot" : 18 }

},
{
"_id" : 3,
"value" :

{ "cnt" : 3, "tot" : 12 }

},
{
"_id" : 4,
"value" :

{ "cnt" : 4, "tot" : 26 }

},
{
"_id" : 5,
"value" :

{ "cnt" : 1, "thetot" : 2 }

}
],
"timeMillis" : 1,
"counts" :

{ "input" : 11, "emit" : 11, "output" : 5 }

,
"ok" : 1,
}



 Comments   
Comment by Jeff lee [ 02/Feb/12 ]

Heh...I guess it helps to fully read the docs.

Thanks!

Comment by Eliot Horowitz (Inactive) [ 02/Feb/12 ]

This is correct behavior.

What you emit and what reduce returns has to be the same format as things get re-reduced.

So its possible things won't get reduced at all if there is only 1 entry for a key.

You can use finalize to do final processing per key.

Generated at Thu Feb 08 03:07:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.