[SERVER-2398] for inline mapreduce, all emitted objects are kept in RAM before the 1st reduce, potential high memory usage Created: 24/Jan/11  Updated: 12/Jul/16  Resolved: 26/Jan/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 1.7.5

Type: Bug Priority: Major - P3
Reporter: Antoine Girbal Assignee: Antoine Girbal
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

during map phase, checkSize() is called to do reduceInMemory and potentially dumpToInc.
But if inline mode, checkSize doesnt do anything.
All objects will be emitted before 1st attempt to reduce.
Instead reduceInMemory should be called if map is over a certain size, or if there is potential for reduce.



 Comments   
Comment by Antoine Girbal [ 25/Jan/11 ]

Here is a test that shows problem
Add 1000000 docs to col:
foo:PRIMARY> for (var i = 0; i < 1000000; ++i){ db.large.save(

{a: Math.random(10000), str: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}

) }

Use an emit that always uses the same key:
foo:PRIMARY> map = function()

{ emit(1, 1); }

function () {
emit(1, 1);
}
foo:PRIMARY> reduce = function(key, vals) { var sum = 0; for (var i = 0; i < vals.length; ++i)

{ sum += vals[i]; }

return sum; }function (key, vals) {
var sum = 0;
for (var i = 0; i < vals.length; ++i)

{ sum += vals[i]; }

return sum;
}

Then apply MR:
foo:PRIMARY> a = db.large.mapReduce(map, reduce, {out: { inline : 1}});
The operation is very long because the internal map gets large.
Actually let it run for 1000s and eventually just killed it..
Also the resident memory usage increases to 1GB and beyond.

Added a fix where data gets reduced every 50KB IF there are potential duplicate.
Now operation completes within 20s.
Also the memory usage of mongod does not increase at all (356MB).
foo:PRIMARY> a = db.large.mapReduce(map, reduce, {out: { inline : 1}});
{
"results" : [

{ "_id" : 1, "value" : 1020000 }

],
"timeMillis" : 21974,
"counts" :

{ "input" : 1020000, "emit" : 1020000, "output" : 1 }

,
"ok" : 1,
}

Generated at Thu Feb 08 02:59:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.