Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.6.5
Component/s: MapReduce, Performance
Labels:
- performance
Environment:
Debian, MongoDB version: 2.6.5

Operating System:
Linux
Steps To Reproduce:
Hide

1) on mongo shell paste following to define a js function to output test documents with n fields

function TestDoc (n) { var doc={}; doc['lang']=['da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'nl', 'pt', 'ro', 'ru', 'sv', 'tr'][Math.floor(Math.random() * 13)] for (var i = 0; i < n; i++) { doc['fld_'+i]=Math.random().toString(34).slice(2)} return doc; }

2) define a function to insert nDocs documents with nFields to a collection

function InsertTestDocs (colName, nDocs, nFields) { for (var i = 0; i < nNocs; i++) { doc=TestDoc(nFields); doc['_id']=i; db[colName].insert(doc) } }

3) insert 1000000 test documents

insertTestDocs("tmp_col", 1000000, 100)

4) create index on 'lang' field

db.tmp_col.ensureIndex({lang: 1})

5) run a mapReduce job that simply sums distincts on lang field

db.runCommand({ mapreduce: "tmp_col", map: function () { emit(this.lang, 1); }, reduce: function (key, values) { return Array.sum(values); }, out: { inline: 1 }})

6) you get results of the following form

"timeMillis" : 116705, "counts" : { "input" : 1000000, "emit" : 1000000, "reduce" : 65000, "output" : 13 }, "ok" : 1

7) run same map Reduce except specify a sort at this time

db.runCommand({ mapreduce: "tmp_col", map: function () { emit(this.lang, 1); }, reduce: function (key, values) { return Array.sum(values); }, sort:{lang:1}, out: { inline: 1 }})

7) you get following results

"timeMillis" : 1478708, "counts" : { "input" : 1000000, "emit" : 1000000, "reduce" : 8474, "output" : 13 }, "ok" : 1

Notice that now it takes 1478708 instead of 116705 (that is ~10X slower) when run without sort option
Show
1) on mongo shell paste following to define a js function to output test documents with n fields function TestDoc (n) { var doc={}; doc[ 'lang' ]=[ 'da' , 'de' , 'en' , 'es' , 'fi' , 'fr' , 'hu' , 'it' , 'nl' , 'pt' , 'ro' , 'ru' , 'sv' , 'tr' ][ Math .floor( Math .random() * 13)] for ( var i = 0; i < n; i++) { doc[ 'fld_' +i]= Math .random().toString(34).slice(2)} return doc; } 2) define a function to insert nDocs documents with nFields to a collection function InsertTestDocs (colName, nDocs, nFields) { for ( var i = 0; i < nNocs; i++) { doc=TestDoc(nFields); doc[ '_id' ]=i; db[colName].insert(doc) } } 3) insert 1000000 test documents insertTestDocs( "tmp_col" , 1000000, 100) 4) create index on 'lang' field db.tmp_col.ensureIndex({lang: 1}) 5) run a mapReduce job that simply sums distincts on lang field db.runCommand({ mapreduce: "tmp_col" , map: function () { emit( this .lang, 1); }, reduce: function (key, values) { return Array.sum(values); }, out: { inline: 1 }}) 6) you get results of the following form "timeMillis" : 116705, "counts" : { "input" : 1000000, "emit" : 1000000, "reduce" : 65000, "output" : 13 }, "ok" : 1 7) run same map Reduce except specify a sort at this time db.runCommand({ mapreduce: "tmp_col" , map: function () { emit( this .lang, 1); }, reduce: function (key, values) { return Array.sum(values); }, sort:{lang:1}, out: { inline: 1 }}) 7) you get following results "timeMillis" : 1478708, "counts" : { "input" : 1000000, "emit" : 1000000, "reduce" : 8474, "output" : 13 }, "ok" : 1 Notice that now it takes 1478708 instead of 116705 (that is ~10X slower) when run without sort option
Confidence Status:
None
Work Order:
0

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Map Reduce operations become very slow (> 1 order of magnitude slower) when run with sort option on emit field.
This is contrary to documentation. that states quite the oposite.
Kindly note:
1. that the delay is somehow proportional to number of fields on document and/or document complexity.
2. Also does not seem to be affected if map reduce outputs inline or in a collection.
3. I get about same results on a stand alone mongoDB and on a replicated one.

Assignee:: Ramon Fernandez Marina
Reporter:: Nick Milonakis
Participants:: Nick Milonakis, Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Dec 14 2014 12:36:21 AM UTC
Updated:: Apr 04 2015 01:05:33 PM UTC
Resolved:: Apr 03 2015 07:44:02 PM UTC

Details

Description

Attachments

Activity

People

Dates