I'm running an incremental map reduce over a large data set in a sharded environment. The output is set to "reduce".
This strategy works fine until the batch I'm running the MR on exceeds ~14Million documents. At which point the MR will fail with error code 10334 saying:
"MR Parallel Processing failed. errmsg: 'Exception: BSONObj size 16951756 ... is invalid ..."
I was under the impression that there are no size concerns when it comes to map/reduce. (Assuming of course your document size doesn't exceed 16MB. All of the documents I'm dealing with are ~400bytes).
It doesn't fail immediately and I suspect it's the cumulative data from one shard or another that is exceeding this size. I wasn't aware that this was an issue with MR in sharded environments.
Any ideas what's going on?
- duplicates
-
SERVER-12949 MapReduce not doing incremental reduces when needed
-
- Closed
-