Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-431

Increase the 4mb BSON Object Limit to 16mb

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.4
    • Component/s: None
    • Labels:
      None

      Description

      Mostly for tracking who/how many others are interested in this, but it would be nice to have the option of >4MB objects.

      My specific use case is the storage of Twitter social graph data. It's not too much of an issue at the moment as it takes about a million id's to overflow the limit, but it's a "nice to have" to not have to hack up some other solution.

        Issue Links

          Activity

          Hide
          rogerbinns Roger Binns added a comment -

          @Eliot: The problem is that there is no easy workaround. Any diligent developer is going to worry about these boundary conditions and the point of putting the data in a database is because you really need the data saved. If the database rejects the data then you have to code a plan B which is a lot of work to foist on every application. You saw how much more work I had to in an earlier message and even that is far more brittle and has far more failure modes. (I also haven't written test code for it yet, but that is going to be a huge amount more.) This arbitrary limit means every client has to be coded with two ways of accessing data - regular and oversize. Solving it once at the database layer for all clients is far more preferable.

          I very much agree with John's list of five. Note that none of those numbers are arbitrary whereas the current limit is. I'll also admit that I was one of those people thinking that the 4MB limit is perfectly fine and anyone going over it wasn't dealing with their data design well. Right up till the moment my data legitimately went over 4MB ...

          Show
          rogerbinns Roger Binns added a comment - @Eliot: The problem is that there is no easy workaround. Any diligent developer is going to worry about these boundary conditions and the point of putting the data in a database is because you really need the data saved. If the database rejects the data then you have to code a plan B which is a lot of work to foist on every application. You saw how much more work I had to in an earlier message and even that is far more brittle and has far more failure modes. (I also haven't written test code for it yet, but that is going to be a huge amount more.) This arbitrary limit means every client has to be coded with two ways of accessing data - regular and oversize. Solving it once at the database layer for all clients is far more preferable. I very much agree with John's list of five. Note that none of those numbers are arbitrary whereas the current limit is. I'll also admit that I was one of those people thinking that the 4MB limit is perfectly fine and anyone going over it wasn't dealing with their data design well. Right up till the moment my data legitimately went over 4MB ...
          Hide
          eliot Eliot Horowitz added a comment -

          We still believe the benefits of limiting to a fixed size outweigh the benefits of no max size.

          Can you open a new ticket to track interest/thoughts.

          This ticket won't change for sure, and definitely not before 1.8

          Show
          eliot Eliot Horowitz added a comment - We still believe the benefits of limiting to a fixed size outweigh the benefits of no max size. Can you open a new ticket to track interest/thoughts. This ticket won't change for sure, and definitely not before 1.8
          Hide
          ramayer Ron Mayer added a comment -

          Eliot wrote: "There is always going to be a limit, even if its crazy high like 2gb. So its really a question of what it is."

          It that's the question, my vote would be for "crazy high like 2gb".

          Well over 99.99% of documents I'm storing fit comfortably in 4MB. However source data we're bringing into MongoDB (xml docs in this format: http://www.niem.gov/index.php from hundreds of government systems) doesn't have any hard constraints on the size of their documents.

          Yes, it's understandable that a huge document would be slow.

          No, it's not an option to simply drop the document.

          And it does kinda suck to have to code differently for the one-in-ten-thousand large documents.

          Show
          ramayer Ron Mayer added a comment - Eliot wrote: "There is always going to be a limit, even if its crazy high like 2gb. So its really a question of what it is." It that's the question, my vote would be for "crazy high like 2gb". Well over 99.99% of documents I'm storing fit comfortably in 4MB. However source data we're bringing into MongoDB (xml docs in this format: http://www.niem.gov/index.php from hundreds of government systems) doesn't have any hard constraints on the size of their documents. Yes, it's understandable that a huge document would be slow. No, it's not an option to simply drop the document. And it does kinda suck to have to code differently for the one-in-ten-thousand large documents.
          Hide
          rogerbinns Roger Binns added a comment -

          Is there a ticket for getting rid of this limit (or having it like John suggested)?

          I'm now hitting the 16MB which means I have to write and test two code paths - one for the majority of data and one for the outliers. We don't run MongoDB on any machine with less than 32GB of RAM so the current arbitrary limit does not help me in any way. In fact it makes me waste time having to write more code and testing.

          Show
          rogerbinns Roger Binns added a comment - Is there a ticket for getting rid of this limit (or having it like John suggested)? I'm now hitting the 16MB which means I have to write and test two code paths - one for the majority of data and one for the outliers. We don't run MongoDB on any machine with less than 32GB of RAM so the current arbitrary limit does not help me in any way. In fact it makes me waste time having to write more code and testing.
          Hide
          lewisg Lewis Geer added a comment -

          Hi,

          Sorry to comment on an old ticket, but there are real world use cases for large documents, especially for biomedical applications. Let's say we have a collection of possible drugs. Some of these drugs we know almost nothing about, perhaps a registry name, a supplier, and a chemical structure. Others, like aspirin or penicillin, we know a whole lot about: clinical studies, pharmacology, and so on. So the average document is relatively small, but there are a few documents that are huge. You can't omit these huge documents as they are of great interest. This happens over and over again in biomedical databases, for example, you might know a lot about an organism named "human", but not a lot about "tasseled wobbegongs" and most other organisms. Of course, this can be coded around, but it would be nice not to be forced to do this and might help adoption of mongodb in organizations that deal with biomedical information, like large research organizations.

          Thanks,
          Lewis

          Show
          lewisg Lewis Geer added a comment - Hi, Sorry to comment on an old ticket, but there are real world use cases for large documents, especially for biomedical applications. Let's say we have a collection of possible drugs. Some of these drugs we know almost nothing about, perhaps a registry name, a supplier, and a chemical structure. Others, like aspirin or penicillin, we know a whole lot about: clinical studies, pharmacology, and so on. So the average document is relatively small, but there are a few documents that are huge. You can't omit these huge documents as they are of great interest. This happens over and over again in biomedical databases, for example, you might know a lot about an organism named "human", but not a lot about "tasseled wobbegongs" and most other organisms. Of course, this can be coded around, but it would be nice not to be forced to do this and might help adoption of mongodb in organizations that deal with biomedical information, like large research organizations. Thanks, Lewis

            People

            • Votes:
              31 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                27 weeks ago
                Date of 1st Reply: