-
Type: New Feature
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: API
-
Environment:Verified on Windows but this does not read as platform-specific
The current documentation in http://api.mongodb.com/python/current/examples/bulk.html states that "A batch of documents can be inserted by passing a list to the insert_many() method. PyMongo will automatically split the batch into smaller sub-batches based on the maximum message size accepted by MongoDB, supporting very large bulk insert operations.".
I have a simple generator that generates dictionaries from bytes using JSON.load that I pass to the insert_many method of the pymongo.Collection class. This will ultimately lead to a MemoryError for large objects that always gets thrown at line 741 of https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py.
I'm not an expert Python programmer, but after digging a bit into the code it seems to me that because of that line all contents will be expanded prior to the start of the bulk insertion process and thus not taking into consideration the size of individual documents to properly split these into batches.
- depends on
-
PYTHON-1752 bulk_write should be able to accept a generator
- Backlog
- is related to
-
MOTOR-314 Allow BULK operations to accept generators
- Blocked