[SERVER-5155] MapRecude weird comportment with/without query Created: 01/Mar/12  Updated: 15/Aug/12  Resolved: 15/Mar/12

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Damien Viel Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centOs


Participants:

 Description   

Hi All,

I've a weird comportment during mapReduce.

On a collection with around 3 000 000 documents I run this mapreduce command :

map = function() { 
  day = this.creationDate.getDate() + '/' + this.creationDate.getMonth() + '/' + this.creationDate.getFullYear(); 
  emit({day: day, playerId: this.playerId, rapportStatus:this.rapportStatus}, 1); 
} 
 
reduce = function(key, values) { 
  var count = 0; 
  for ( var i=0; i<values.length; i++ ) { count = count+1; } 
  return count; 
} 
 
> db.report.mapReduce(map,reduce,{out: 'stats6'}) 
{ 
        "result" : "stats6", 
        "timeMillis" : 366532, 
        "counts" : { 
                "input" : 3047542, 
                "emit" : 3047542, 
                "reduce" : 708187, 
                "output" : 97182 
        }, 
        "ok" : 1, 
} 

Then I run the following find command :

> db.stats6.find({'_id.playerId':'01426c26936541bca795309a01e4be53', '_id.day':'1/0/2012'}) 
{ "_id" : { "day" : "1/0/2012", "playerId" : "01426c26936541bca795309a01e4be53", "rapportStatus" : "INFOS" }, "value" : 31 } 
{ "_id" : { "day" : "1/0/2012", "playerId" : "01426c26936541bca795309a01e4be53", "rapportStatus" : "WARN" }, "value" : 1 } 

this value are false.
But when running the same mapReduce with a query like following :

> db.report.mapReduce(map,reduce,{query : {'playerId':'01426c26936541bca795309a01e4be53'}, out:'stats7'}) 
{ 
        "result" : "stats7", 
        "timeMillis" : 533, 
        "counts" : { 
                "input" : 5014, 
                "emit" : 5014, 
                "reduce" : 185, 
                "output" : 97182 
        }, 
        "ok" : 1, 
} 

I've got differents results (which are actually good)

> db.stats7.find({'_id.playerId':'01426c26936541bca795309a01e4be53', '_id.day':'1/0/2012'}) 
{ "_id" : { "day" : "1/0/2012", "playerId" : "01426c26936541bca795309a01e4be53", "rapportStatus" : "INFOS" }, "value" : 49 } 
{ "_id" : { "day" : "1/0/2012", "playerId" : "01426c26936541bca795309a01e4be53", "rapportStatus" : "WARN" }, "value" : 1 } 

Somebody can explain me why ??
Thanks

Damien



 Comments   
Comment by Antoine Girbal [ 15/Mar/12 ]

the reason is that the reduce method is not correct.
It may be called several times over intermediate results, so you need to use the actual values in array.

reduce = function(key, values) { 
  var count = 0; 
  for ( var i=0; i<values.length; i++ ) { count += values[i]; } 
  return count; 
}
or just
reduce = function(key, values) { return Array.sum(values); }

Comment by Damien Viel [ 01/Mar/12 ]

Erratum : Here is the correct output for the second mapReduce command :

> db.report.mapReduce(map,reduce,{query : {'playerId':'01426c26936541bca795309a01e4be53'}, out:'stats7'})
{
	"result" : "stats7",
	"timeMillis" : 512,
	"counts" : {
		"input" : 5014,
		"emit" : 5014,
		"reduce" : 185,
		"output" : 204
	},
	"ok" : 1,
}
 

Generated at Thu Feb 08 03:08:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.