Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-431

Increase the 4mb BSON Object Limit to 16mb

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.4
    • Component/s: None
    • Labels:
      None

      Description

      Mostly for tracking who/how many others are interested in this, but it would be nice to have the option of >4MB objects.

      My specific use case is the storage of Twitter social graph data. It's not too much of an issue at the moment as it takes about a million id's to overflow the limit, but it's a "nice to have" to not have to hack up some other solution.

        Issue Links

          Activity

          Hide
          eliot Eliot Horowitz added a comment -

          We still believe the benefits of limiting to a fixed size outweigh the benefits of no max size.

          Can you open a new ticket to track interest/thoughts.

          This ticket won't change for sure, and definitely not before 1.8

          Show
          eliot Eliot Horowitz added a comment - We still believe the benefits of limiting to a fixed size outweigh the benefits of no max size. Can you open a new ticket to track interest/thoughts. This ticket won't change for sure, and definitely not before 1.8
          Hide
          ramayer Ron Mayer added a comment -

          Eliot wrote: "There is always going to be a limit, even if its crazy high like 2gb. So its really a question of what it is."

          It that's the question, my vote would be for "crazy high like 2gb".

          Well over 99.99% of documents I'm storing fit comfortably in 4MB. However source data we're bringing into MongoDB (xml docs in this format: http://www.niem.gov/index.php from hundreds of government systems) doesn't have any hard constraints on the size of their documents.

          Yes, it's understandable that a huge document would be slow.

          No, it's not an option to simply drop the document.

          And it does kinda suck to have to code differently for the one-in-ten-thousand large documents.

          Show
          ramayer Ron Mayer added a comment - Eliot wrote: "There is always going to be a limit, even if its crazy high like 2gb. So its really a question of what it is." It that's the question, my vote would be for "crazy high like 2gb". Well over 99.99% of documents I'm storing fit comfortably in 4MB. However source data we're bringing into MongoDB (xml docs in this format: http://www.niem.gov/index.php from hundreds of government systems) doesn't have any hard constraints on the size of their documents. Yes, it's understandable that a huge document would be slow. No, it's not an option to simply drop the document. And it does kinda suck to have to code differently for the one-in-ten-thousand large documents.
          Hide
          rogerbinns Roger Binns added a comment -

          Is there a ticket for getting rid of this limit (or having it like John suggested)?

          I'm now hitting the 16MB which means I have to write and test two code paths - one for the majority of data and one for the outliers. We don't run MongoDB on any machine with less than 32GB of RAM so the current arbitrary limit does not help me in any way. In fact it makes me waste time having to write more code and testing.

          Show
          rogerbinns Roger Binns added a comment - Is there a ticket for getting rid of this limit (or having it like John suggested)? I'm now hitting the 16MB which means I have to write and test two code paths - one for the majority of data and one for the outliers. We don't run MongoDB on any machine with less than 32GB of RAM so the current arbitrary limit does not help me in any way. In fact it makes me waste time having to write more code and testing.
          Hide
          lewisg Lewis Geer added a comment -

          Hi,

          Sorry to comment on an old ticket, but there are real world use cases for large documents, especially for biomedical applications. Let's say we have a collection of possible drugs. Some of these drugs we know almost nothing about, perhaps a registry name, a supplier, and a chemical structure. Others, like aspirin or penicillin, we know a whole lot about: clinical studies, pharmacology, and so on. So the average document is relatively small, but there are a few documents that are huge. You can't omit these huge documents as they are of great interest. This happens over and over again in biomedical databases, for example, you might know a lot about an organism named "human", but not a lot about "tasseled wobbegongs" and most other organisms. Of course, this can be coded around, but it would be nice not to be forced to do this and might help adoption of mongodb in organizations that deal with biomedical information, like large research organizations.

          Thanks,
          Lewis

          Show
          lewisg Lewis Geer added a comment - Hi, Sorry to comment on an old ticket, but there are real world use cases for large documents, especially for biomedical applications. Let's say we have a collection of possible drugs. Some of these drugs we know almost nothing about, perhaps a registry name, a supplier, and a chemical structure. Others, like aspirin or penicillin, we know a whole lot about: clinical studies, pharmacology, and so on. So the average document is relatively small, but there are a few documents that are huge. You can't omit these huge documents as they are of great interest. This happens over and over again in biomedical databases, for example, you might know a lot about an organism named "human", but not a lot about "tasseled wobbegongs" and most other organisms. Of course, this can be coded around, but it would be nice not to be forced to do this and might help adoption of mongodb in organizations that deal with biomedical information, like large research organizations. Thanks, Lewis
          Hide
          senthinil senthil added a comment -

          We are in book publishing industry, we have lots of book metadata information and get them back to reports, we are able to afford better infrastructure of > 256 GB of RAM and quad core multiprocessors and SSD, it certainly does not meet our requirements. Please don't restrict this limit as its a hinderance for users who use mongodb.

          Show
          senthinil senthil added a comment - We are in book publishing industry, we have lots of book metadata information and get them back to reports, we are able to afford better infrastructure of > 256 GB of RAM and quad core multiprocessors and SSD, it certainly does not meet our requirements. Please don't restrict this limit as its a hinderance for users who use mongodb.

            People

            • Votes:
              31 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: