[SERVER-7037] Error in M/R -- 'value too large to reduce' Created: 13/Sep/12  Updated: 26/Aug/17  Resolved: 11/Apr/13

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.0.0, 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mete Dizioglu Assignee: Tad Marshall
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

When running a map/reduce job on a collection containing a large number of entity for a single key in the map/reduce operation, the computation fails because of the size of the object fed into the reduce.
In our case the reduce accumulates the values encountered in an array. The finalize would after trim down these values and apply the final operations to the final object. This accumulation process cannot be filtered before and needs all the informations.

Code:
=====

m = function(){
    /**
     * @description Mapper
     * @return 
     **/
    if( this._attributes ){
	emit(
	    this._attributes.entityid, 
	    {
	    'entity': this,
	    'deltas':[]
	    }
	    );
    } else if( this.RelatedEntityId )
    {
	emit( this.RelatedEntityId, 
	      {
		  'entity': undefined,
		  'deltas': [this]
	      }
	    );
    }
};
 
 
r = function(key, values){
    /**
     * @description Reducer function
     * @param key
     * @param values Values associated with the given key
     * @return map containing up to 3 fields: deltas with a list of deltas, entity with the entity, key which contains the key
     **/
 
    var reducedPackage = { 'deltas' : [], 
			   'entity' : undefined };
    for( indexValue in values ){
	currentValue = values[indexValue];
	if( currentValue.entity ){
	    reducedPackage.entity = currentValue.entity;
	}
	reducedPackage.deltas = reducedPackage.deltas.concat( currentValue.deltas );
    }
    return reducedPackage;
};

Additional information:
=======================
Avg Object Size: 82 Kb, Size of the database: 60 Mb, Storage: 80 Mb
Object Count: 801
Number of MapReduce keys: 1



 Comments   
Comment by Jörg Rech [ 26/Aug/17 ]

When running a M/R job on a large dataset I run in the same problem. As this issue is still open I will comment here and not raise another issue.

We run a deduplication M/R job on 594.356.796 documents in a MongoDB cluster as well as a single instance (both installations have the same data):
MongoDB shell version v3.4.6


on the cluster we get the following error:
Caught: com.mongodb.CommandFailureException: { "serverUsed" : "localhost:27017" , "ok" : 0.0 , "errmsg" : "MR post processing failed:

{ ok: 0.0, errmsg: \"value too large to reduce\", code: 13070, codeName: \"Location13070\" }

"}
com.mongodb.CommandFailureException: { "serverUsed" : "localhost:27017" , "ok" : 0.0 , "errmsg" : "MR post processing failed:

{ ok: 0.0, errmsg: \"value too large to reduce\", code: 13070, codeName: \"Location13070\" }

"}

logs - from one shard:
...
2017-08-23T18:08:32.009+0000 I - [conn332363] M/R: (3/3) Final Reduce Progress: 287129100
2017-08-23T18:08:35.013+0000 I - [conn332363] M/R: (3/3) Final Reduce Progress: 287163200
2017-08-23T18:08:38.007+0000 I - [conn332363] M/R: (3/3) Final Reduce Progress: 287198100
2017-08-23T18:08:38.624+0000 I COMMAND [conn332363] CMD: drop <DB>.tmp.mrs.profile_1503387093_19
2017-08-23T18:08:38.783+0000 I COMMAND [conn332363] command <DB>.tmp.mr.profile_37 command: renameCollection

{ renameCollection: "<DB>.tmp.mr.profile_37", to: "<DB>.tmp.mrs.profile_1503387093_19", stayTemp: true }

numYields:0 reslen:117 locks:{ Global: { acquireCount:

{ r: 1437654548, w: 854234067, W: 3 }

}, Database: { acquireCount:

{ r: 285961588, w: 854234064, R: 5748651, W: 6 }

}, Collection: { acquireCount:

{ r: 285961588, w: 570719347 }

}, Metadata: { acquireCount:

{ w: 283514720 }

}, oplog: { acquireCount:

{ w: 283514720 }

} } protocol:op_query 111ms


on a single server we get the following error:
Caught: com.mongodb.CommandFailureException:

{ "serverUsed" : "localhost:27017" , "ok" : 0.0 , "errmsg" : "Converting from JavaScript to BSON failed: Object size 16894234 exceeds limit of 16793600 bytes." , "code" : 17260 , "codeName" : "Location17260"}

com.mongodb.CommandFailureException:

{ "serverUsed" : "localhost:27017" , "ok" : 0.0 , "errmsg" : "Converting from JavaScript to BSON failed: Object size 16894234 exceeds limit of 16793600 bytes." , "code" : 17260 , "codeName" : "Location17260"}

logs:
...
2017-08-25T08:34:13.002+0000 I - [conn496] M/R: (3/3) Final Reduce Progress: 645200400
2017-08-25T08:34:16.008+0000 I - [conn496] M/R: (3/3) Final Reduce Progress: 645245200
2017-08-25T08:34:19.003+0000 I - [conn496] M/R: (3/3) Final Reduce Progress: 645288400
2017-08-25T08:34:22.029+0000 I COMMAND [conn496] CMD: drop <DB>.tmp.mr.profile_106
2017-08-25T08:34:22.072+0000 I COMMAND [conn496] mr failed, removing collection17260 Converting from JavaScript to BSON failed: Object size 16894234 exceeds limit of 16793600 bytes.
2017-08-25T08:34:22.072+0000 I COMMAND [conn496] CMD: drop <DB>.tmp.mr.profile_106
2017-08-25T08:34:22.174+0000 I COMMAND [conn496] command <DB>.tmp.mr.profile_106_inc command: mapReduce { mapreduce: "profile", map: "

Is there any way of increasing a setting such that the MR job works?

Comment by Tad Marshall [ 13/Sep/12 ]

Would you be able to attach a log file showing the error?

Generated at Thu Feb 08 03:13:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.