Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1517

insert_many should work with arbitrarily long iterables

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: API
    • Environment:
      Verified on Windows but this does not read as platform-specific

      Description

      The current documentation in http://api.mongodb.com/python/current/examples/bulk.html states that "A batch of documents can be inserted by passing a list to the insert_many() method. PyMongo will automatically split the batch into smaller sub-batches based on the maximum message size accepted by MongoDB, supporting very large bulk insert operations.".
      I have a simple generator that generates dictionaries from bytes using JSON.load that I pass to the insert_many method of the pymongo.Collection class. This will ultimately lead to a MemoryError for large objects that always gets thrown at line 741 of https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py.
      I'm not an expert Python programmer, but after digging a bit into the code it seems to me that because of that line all contents will be expanded prior to the start of the bulk insertion process and thus not taking into consideration the size of individual documents to properly split these into batches.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              falmeida Fernando Almeida
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: