Hitting Max Document size upon Map/Reduce?

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • 2.6.0-rc2
    • Affects Version/s: 2.5.5
    • Component/s: MapReduce
    • None
    • Environment:
      CentOS 6.5 x86_64 Sharded
    • Fully Compatible
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I'm running an incremental map reduce over a large data set in a sharded environment. The output is set to "reduce".

      This strategy works fine until the batch I'm running the MR on exceeds ~14Million documents. At which point the MR will fail with error code 10334 saying:

      "MR Parallel Processing failed. errmsg: 'Exception: BSONObj size 16951756 ... is invalid ..."

      I was under the impression that there are no size concerns when it comes to map/reduce. (Assuming of course your document size doesn't exceed 16MB. All of the documents I'm dealing with are ~400bytes).

      It doesn't fail immediately and I suspect it's the cumulative data from one shard or another that is exceeding this size. I wasn't aware that this was an issue with MR in sharded environments.

      Any ideas what's going on?

            Assignee:
            Mathias Stearn
            Reporter:
            Brad C.
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: