Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.6.0-rc2
Affects Version/s: 2.5.5
Component/s: MapReduce
Labels:
None
Environment:
CentOS 6.5 x86_64 Sharded

Backwards Compatibility:
Fully Compatible
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm running an incremental map reduce over a large data set in a sharded environment. The output is set to "reduce".

This strategy works fine until the batch I'm running the MR on exceeds ~14Million documents. At which point the MR will fail with error code 10334 saying:

"MR Parallel Processing failed. errmsg: 'Exception: BSONObj size 16951756 ... is invalid ..."

I was under the impression that there are no size concerns when it comes to map/reduce. (Assuming of course your document size doesn't exceed 16MB. All of the documents I'm dealing with are ~400bytes).

It doesn't fail immediately and I suspect it's the cumulative data from one shard or another that is exceeding this size. I wasn't aware that this was an issue with MR in sharded environments.

Any ideas what's going on?

duplicates

SERVER-12949 MapReduce not doing incremental reduces when needed

Closed

Assignee:: Mathias Stearn
Reporter:: Brad C.
Participants:: Brad C., Githook User, Mathias Stearn
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Feb 04 2014 09:52:34 PM UTC
Updated:: Jul 11 2016 05:17:15 PM UTC
Resolved:: Mar 19 2014 07:36:33 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates