[SERVER-12949] MapReduce not doing incremental reduces when needed Created: 27/Feb/14  Updated: 11/Jul/16  Resolved: 17/Mar/14

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.6.0-rc0
Fix Version/s: 2.6.0-rc2

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Diff_2.4.9_2.6.0rc1.txt    
Issue Links:
Duplicate
is duplicated by SERVER-12615 Hitting Max Document size upon Map/Re... Closed
Related
Operating System: ALL
Steps To Reproduce:

Original Title: mapreduce stats have changed between 2.4 and 2.6

Start mongo.exe and execute the following script (succeeds against 2.4 server and fails against 2.6 server):

var db = db.getSisterDB("TestDB");
db.dropDatabase();
 
var coll = db.getCollection( "foo.mrInput" );
coll.getDB();
 
for (i=1; i<50; i++) {
    for (j=0; j<10; j++) {
        coll.insert({i: i, j:j});
    }
}
 
function mapFn() { emit(this.j, 1); }
function reduceFn(key, values) { return Array.sum(values); } 
 
var out = coll.mapReduce(mapFn, reduceFn, { out: { replace: "mrOutput", sharded: true } });
assert.eq(out.counts.reduce, 30, "reduce count is wrong");

Participants:

 Description   

The mapreduce reduce count statistic has changed between 2.4 and 2.6. It is possible that it was incorrect before and has been fixed in 2.6, but this has to be verified.

To reproduce, see the script in the repro steps section.

In 2.4, the reduce count is 30 and in 2.6 it is 10. The value in 2.6 seems more reasonable, given that the number of distinct ids emitted is 10, but this has to be confirmed.



 Comments   
Comment by Githook User [ 17/Mar/14 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-12949 Restore intermediate in-memory reduces, which are used to
keep the memory occupied by map/reduce small and spill extra large results
to disk.

Also adds a JS unit test to ensure this behaviour doesn't change
accidentally.

(cherry picked from commit acc47ab1b993d1abc295974200ffc57d98f4b80e)
Branch: v2.6
https://github.com/mongodb/mongo/commit/9f437614f40f07d627f2e2a16bdba08a7e12f890

Comment by Githook User [ 17/Mar/14 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-12949 Restore intermediate in-memory reduces, which are used to
keep the memory occupied by map/reduce small and spill extra large results
to disk.

Also adds a JS unit test to ensure this behaviour doesn't change
accidentally.
Branch: master
https://github.com/mongodb/mongo/commit/acc47ab1b993d1abc295974200ffc57d98f4b80e

Comment by Daniel Pasette (Inactive) [ 03/Mar/14 ]

repro'd on 2.4 (after removing the mergeChunk cmd). I don't know what work has happened that would cause this change though so definitely want to explain it.

Generated at Thu Feb 08 03:30:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.