[SERVER-2761] Confusing Map / Reduce output when sharded Created: 15/Mar/11  Updated: 12/Jul/16  Resolved: 13/Jun/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.0-rc2
Fix Version/s: 1.9.1

Type: Improvement Priority: Trivial - P5
Reporter: Gaetan Voyer-Perrault Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Any


Participants:

 Description   

In a sharded environment, the output of the "mapreduce" command is confusing.

Following is a map-reduce that generates a "tags" collection. That "tags" collection contains 5077 documents.

Problems:
#1: There is no indication of 5077 anywhere in that output.
#2: Based on the groups question, it looks like the output value from each shard may not be the last reduction for that node.
So the output value for a node may misrepresent that data that is actually on that node.

======
> db.runCommand(

{ mapreduce: 'test', map:map, reduce:reduce, out : 'tags'}

);
{
"result" : "tags",
"shardCounts" : {
"localhost:6900" :

{ "input" : 36443, "emit" : 30756, "output" : 2906 }

,
"localhost:6901" :

{ "input" : 18808, "emit" : 12933, "output" : 1948 }

,
"localhost:6902" :

{ "input" : 27274, "emit" : 24070, "output" : 2765 }

,
"localhost:6903" :

{ "input" : 45220, "emit" : 43915, "output" : 3742 }

,
"localhost:6904" :

{ "input" : 48399, "emit" : 45688, "output" : 3842 }

,
"localhost:6905" :

{ "input" : 27413, "emit" : 23050, "output" : 2837 }

},
"counts" :

{ "emit" : NumberLong(180412), "input" : NumberLong(203557), "output" : NumberLong(18040) }

,
"ok" : 1,
"timeMillis" : 10978,
"timing" :

{ "shards" : 10609, "final" : 368 }

}
> db.tags.count()
5077
======

Associated groups question:
http://groups.google.com/group/mongodb-user/browse_thread/thread/65869b34e4230a3b#



 Comments   
Comment by Antoine Girbal [ 13/Jun/11 ]

Fixed:

  • output is now final number of doc in final collection
  • reduce count is number of reduce of each shard plus the ones done in final step

{ "result" : "mroutputshard" ,
"shardCounts" :
{ "foo/localhost:27017" :

{ "input" : 1264 , "emit" : 1264 , "reduce" : 34 , "output" : 26}

,
"foo2/localhost:37017" : { "input" : 1238 , "emit" : 1238 , "reduce" : 48 , "output" : 39}} ,
"counts" :

{ "emit" : 2502 , "input" : 2502 , "output" : 40 , "reduce" : 107}

,
"ok" : 1.0 , "timeMillis" : 1351 , "timing" : { "shards" : 1305 , "final" : 46}}

Comment by Antoine Girbal [ 13/Jun/11 ]

the "counts" object is just the sum of same fields from each shard.
While "emit" and "input" are correct, "output" and "reduce" are wrong.

Generated at Thu Feb 08 03:01:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.