[SERVER-431] Increase the 4mb BSON Object Limit to 16mb Created: 19/Nov/09 Updated: 17/Sep/21 Resolved: 09/Dec/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.7.4 |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Damon Cortesi | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 31 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
Mostly for tracking who/how many others are interested in this, but it would be nice to have the option of >4MB objects. My specific use case is the storage of Twitter social graph data. It's not too much of an issue at the moment as it takes about a million id's to overflow the limit, but it's a "nice to have" to not have to hack up some other solution. |
| Comments |
| Comment by Ivan Fioravanti [ 17/Sep/21 ] |
|
Sorry to vote on an old ticket, but 11 years after the raise from 4MB to 16MB, I think that a raise to 64MB makes sense, no? I've opened a new ticket to consider this here: https://jira.mongodb.org/browse/SERVER-60040?filter=-4 |
| Comment by senthil [ 10/Apr/17 ] |
|
We are in book publishing industry, we have lots of book metadata information and get them back to reports, we are able to afford better infrastructure of > 256 GB of RAM and quad core multiprocessors and SSD, it certainly does not meet our requirements. Please don't restrict this limit as its a hinderance for users who use mongodb. |
| Comment by Lewis Geer [ 26/Feb/15 ] |
|
Hi, Sorry to comment on an old ticket, but there are real world use cases for large documents, especially for biomedical applications. Let's say we have a collection of possible drugs. Some of these drugs we know almost nothing about, perhaps a registry name, a supplier, and a chemical structure. Others, like aspirin or penicillin, we know a whole lot about: clinical studies, pharmacology, and so on. So the average document is relatively small, but there are a few documents that are huge. You can't omit these huge documents as they are of great interest. This happens over and over again in biomedical databases, for example, you might know a lot about an organism named "human", but not a lot about "tasseled wobbegongs" and most other organisms. Of course, this can be coded around, but it would be nice not to be forced to do this and might help adoption of mongodb in organizations that deal with biomedical information, like large research organizations. Thanks, |
| Comment by Roger Binns [ 05/Jun/13 ] |
|
Is there a ticket for getting rid of this limit (or having it like John suggested)? I'm now hitting the 16MB which means I have to write and test two code paths - one for the majority of data and one for the outliers. We don't run MongoDB on any machine with less than 32GB of RAM so the current arbitrary limit does not help me in any way. In fact it makes me waste time having to write more code and testing. |
| Comment by Ron Mayer [ 23/Mar/11 ] |
|
Eliot wrote: "There is always going to be a limit, even if its crazy high like 2gb. So its really a question of what it is." It that's the question, my vote would be for "crazy high like 2gb". Well over 99.99% of documents I'm storing fit comfortably in 4MB. However source data we're bringing into MongoDB (xml docs in this format: http://www.niem.gov/index.php from hundreds of government systems) doesn't have any hard constraints on the size of their documents. Yes, it's understandable that a huge document would be slow. No, it's not an option to simply drop the document. And it does kinda suck to have to code differently for the one-in-ten-thousand large documents. |
| Comment by Eliot Horowitz (Inactive) [ 12/Jan/11 ] |
|
We still believe the benefits of limiting to a fixed size outweigh the benefits of no max size. Can you open a new ticket to track interest/thoughts. This ticket won't change for sure, and definitely not before 1.8 |
| Comment by Roger Binns [ 12/Jan/11 ] |
|
@Eliot: The problem is that there is no easy workaround. Any diligent developer is going to worry about these boundary conditions and the point of putting the data in a database is because you really need the data saved. If the database rejects the data then you have to code a plan B which is a lot of work to foist on every application. You saw how much more work I had to in an earlier message and even that is far more brittle and has far more failure modes. (I also haven't written test code for it yet, but that is going to be a huge amount more.) This arbitrary limit means every client has to be coded with two ways of accessing data - regular and oversize. Solving it once at the database layer for all clients is far more preferable. I very much agree with John's list of five. Note that none of those numbers are arbitrary whereas the current limit is. I'll also admit that I was one of those people thinking that the 4MB limit is perfectly fine and anyone going over it wasn't dealing with their data design well. Right up till the moment my data legitimately went over 4MB ... |
| Comment by John Crenshaw [ 12/Jan/11 ] |
|
I think it is safe to say that everybody will accept any/all of the limits below without disappointment: The big problem is not whether we will normally want to store that much data in a single record, but whether it MIGHT get that large under extraordinary conditions. If we were dealing with records that were likely to get this large, we would be foolish to not restructure the code. Conversely, it seems rather silly to use a complicated model and have to send multiple queries to get the job done, just to avoid problems that might happen if somehow the structure becomes large enough to overflow the limits. The best model in this case (really) is the one that works best under 99.9% of conditions, but we can't use that model if it might overflow in the edge cases, even if it normally only overflows just a little. In real world terms, we're trying to avoid the case where that one user does something a bit strange (like writing a book in the comments), and overflows the record limits. Right now, avoiding this means restructuring the data into multiple collections and records anytime we don't have enough control over size or quantity of entries in an array. There are two types of structure that I can think of that might overflow in the edge cases. First: Some things that I thought of that might be like this are: The second structure is slightly similar to the first: Some things that I thought of that might be like this are: Sure, you can work around all these cases by adjusting the schema, but the most obvious schema, and the one that works best for 99.99% of the records in these cases, can't be used, because it might overflow at just the worst time. Adjusting the schema generally requires mountains of additional application code, and is less stable. This is why people are hoping for a system that manages to "somehow" behave itself when things go beyond the "normal" limits. |
| Comment by Eliot Horowitz (Inactive) [ 12/Jan/11 ] |
|
The argument can be made at 501mb and 17mb, so once there's a limit, there's a limit. Some hard technical limits is that an object has to fit in ram. Can you give an example of your schema where you'd want documents that large? |
| Comment by Julian Morrison [ 11/Jan/11 ] |
|
The problem is not 500mb documents, it's situations where you can't be certain a document will never be 17mb. This will still be true of ANY fixed limit. Is there a technical reason it would be impossible for documents to just grow up to the bounds of storage, if necessary? You can still warn people that performance suffers unless documents are mostly small. |
| Comment by Eliot Horowitz (Inactive) [ 11/Jan/11 ] |
|
There is always going to be a limit, even if its crazy high like 2gb. If you had a 500mb document, performance would be really really bad. So 16mb seems to be the best of both worlds. When 1.8 is out for a while, we can look again. |
| Comment by John Crenshaw [ 10/Jan/11 ] |
|
I do see the value to the increase to another arbitrary limit (4MB was feeling a little cramped on the edge cases, so the increase gives room to breath and feels GREAT), but I also understand where Walt is coming from. Are there any plans to allow the limit to be removed entirely? "Not breaking" is always more important to me than "uniform performance", so if performance is uniform...except in the cases where it would break right now, in which case it at least works...I'm a happy camper. (Besides, anything that grows that big probably triggered a lot of "non uniform performance" long before it got to Mongo). Now that you've already changed it once, I imagine that the "driver assumptions" reason is an acceptable loss. Am I missing something? In any case, it's nice to be up to 16MB. Thanks! |
| Comment by Walt Woods [ 09/Jan/11 ] |
|
I really don't understand the rationale for increasing this to another static limit... The whole issue is that it should be a flexible setting (e.g. runtime configuration) dependent on the use case of MongoDb. |
| Comment by Roger Binns [ 09/Jan/11 ] |
|
I'm another user bitten by this arbitrary limit. In one of my schemas documents represent a file. I generate an opaque binary blob index of each file (pickled Python data structure behind the scenes) and stick it in the document too. A recent algorithm change means that this binary blob grew larger. (It is a temporary change and will be optimized to a smaller size later). For about 3 percent of my files the blob is now larger than 4MB. (Largest is 11MB, compression halves the size). I am running on a server with 24GB of RAM so these small sizes are trivial. I had to write new code to do the following:
ie this arbitrary limit forced me to do a lot more work, increased code complexity and made everything far more brittle. I'd have no issue with the limit being in the hundreds of megabytes range but a handful of megabytes really doesn't help. |
| Comment by auto [ 09/Dec/10 ] |
|
Author: {u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}Message: increase bson size to 16mb |
| Comment by Eliot Horowitz (Inactive) [ 12/Nov/10 ] |
|
we're going to go 16 for 1.8 |
| Comment by auto [ 12/Oct/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: increase bson size to 8mb |
| Comment by auto [ 11/Oct/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: split bson max size into User and Internal |
| Comment by auto [ 11/Oct/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: using BSONObjMaxSize everywhere bson size comes into play |
| Comment by John Crenshaw [ 08/Oct/10 ] |
|
Ditto what Walt said. A compilation flag is too inflexible. I'm also paranoid. With every line of code I write I think "how could this break", which makes data modeling for Mongo pure torture. I'd love to see the 4MB limit just disappear. I don't care if a document becomes slower to work with after the 4MB limit breaks down. Although only slightly related, I also share Walt's frustration with being unable to return only a matching embedded document.Consider the case of comments on a blog post, which would be placed in the blog post record; but, that makes it nearly impossible to create an admin administration section that deals with comments separately from the post. I think you can do it with MapReduce, but that seems like a really nasty way of doing it, and I expect it would slow things way down. Returning only the embedded document would of course be really helpful for document types that may become large. It's a lot nicer to return 2k of embedded documents, rather than 4MB of document. |
| Comment by Walt Woods [ 07/Oct/10 ] |
|
@Leon Mergen - Yeah; I really wouldn't want it to be a compilation flag though. I'd much prefer the flexibility to specify for different database storage points. A runtime configuration option. Even an api call for configuring a specific collection would be a good idea, as it might help prevent overflow in collections that really shouldn't have more than a 4MB limit. Also, unlimited would be nice.... Yes I'm paranoid... |
| Comment by Leon Mergen [ 07/Oct/10 ] |
|
If we could build our own mongo server with a --max_object_size=16MB for example, and the 10gen-hosted binary being 4MB, that would be perfectly acceptable to me. I just like to gain a bit more control over the max size, instead of being dictated what my max object size should be, as long as i know the consequences for "disobeying" the recommended max object size. |
| Comment by Walt Woods [ 07/Oct/10 ] |
|
This is actually the reason I'm not porting my app over to MongoDb. I think MongoDb would be a better fit than CouchDb for my app, due to things like $push, $pull, and eventually virtual / indexed collections (it's also pushing me away that I can filter on { 'foo.bar': 3 }, but not grab only the matching embedded document), but I can't actually use these features unless I'm sure that my application won't throw errors when e.g. a related object list grows too large (these embedded documents are very small and numerous, and shouldn't be their own documents. They are in CouchDb, and they would have to be in MongoDb at the moment as well). Maybe at the very least, there could be a mongod flag that allows overriding of the BSON size limit, "for experienced users willing to accept the risks". That would be a good compromise. |
| Comment by Leon Mergen [ 15/Jul/10 ] |
|
+1 for this fix, I concur Julian's comment. In some rare cases, we might hit more than 4MB documents, degrading performance wouldn't be a problem, data loss/exceptions would. |
| Comment by Khash Sajadi [ 14/Jul/10 ] |
|
I would love to see the limit increase. We're storing web pages in documents and this would help a lot. |
| Comment by David Lee [ 17/Jun/10 ] |
|
Perhaps the 4mb limit could be taken out in favor on adding documentation that mongodb works best with small documents. Like Julian, I also worry that the 4mb limit would cause problems on some rare cases. |
| Comment by Julian Morrison [ 23/May/10 ] |
|
A hard limit is a fundamentally different kind of thing to degrading performance, even a steep degradation. What it means is that if your data might ever, ever approach the 4MB limit, even under fringe exceptional circumstances (comment thread featured on Digg, etc), then you are going to have to split your data across multiple objects, even if that's a lot of extra code and requires pointless extra queries and CPU work in the less-than-4mb case. Spikes of load outside the norm always do happen, and they're the worst time to have a site break. So if you find a way to make this limit go away, it makes designing an app that uses MongoDB a lot easier. |
| Comment by Eliot Horowitz (Inactive) [ 19/Nov/09 ] |
|
the 4mb limit isn't a hard limit per se, its easy to change. if there is a large consensus that it should change however, we certainly could. |