[SERVER-21863] map/reduce permits documents larger than 16MB to be inserted Created: 11/Dec/15  Updated: 21/Nov/16  Resolved: 15/Jan/16

Status: Closed
Project: Core Server
Component/s: MapReduce, Replication
Affects Version/s: None
Fix Version/s: 3.2.5, 3.3.1

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-9502 Using regex in _id breaks replication Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: QuInt E (01/11/16), Query F (02/01/16)
Participants:

 Description   

This issue affects the following usages of the mapReduce command:

  • {out: {replace: <collectionName>}}
  • {out: {reduce: <collectionName>}} when <collectionName> doesn't exist or is empty
  • {out: {merge: <collectionName>}} when <collectionName> doesn't exist or is empty

The map/reduce code checks that the value emit()-ted is less than BSONObjMaxUserSize / 2 here and here. The code doesn't check that the value returned by the reduce() and finalize() functions won't lead to inserting a document larger than BSONObjMaxUserSize into the temporary collection, except incidentally when Helpers::upsert() is used.


Original description

The code obliquely checks BSONObjMaxUserSize in a few places, but it's unclear whether it could eventually end up calling insertDocument() after reduce, with a document that's too big.



 Comments   
Comment by Kevin Pulo [ 15/Apr/16 ]

There are other ways that MapReduce can cause bad documents to be inserted in affected versions, for example, documents with a regex for the _id field (SERVER-9502). This causes secondaries to crash in 3.0 and earlier, and can cause silent replica set inconsistencies in 3.2. Because the fix on this ticket causes MR to call fixDocumentForInsert, which checks for bad _id values (and other erroneous conditions) in addition to checking the BSON document size.

Comment by Githook User [ 29/Mar/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: SERVER-21863, SERVER-22767: fix lint errors from recent backports.
Branch: v3.2
https://github.com/mongodb/mongo/commit/bc59fbb5a6aa562c010eff525bd3a0b905392d97

Comment by Githook User [ 29/Mar/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-21863 Prevent map-reduce from inserting >16MB documents.

(cherry picked from commit 64a7daba1746dcda0f7d25eab82d35e2c093d54f)
Branch: v3.2
https://github.com/mongodb/mongo/commit/752031f3a0186167997631d64a2aea2409ab0f1a

Comment by Githook User [ 15/Jan/16 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-21863 Prevent map-reduce from inserting >16MB documents.
Branch: master
https://github.com/mongodb/mongo/commit/64a7daba1746dcda0f7d25eab82d35e2c093d54f

Comment by J Rassi [ 11/Dec/15 ]
  • We should create a reproducible case of this issue.
  • Assuming this issue is reproducible, the map-reduce code should pass all documents to insert through fixDocumentForInsert().
Generated at Thu Feb 08 03:58:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.