-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Reported here: https://developer.mongodb.com/community/forums/t/high-memory-usage-while-inserting-binary-file/106679/3
The reporter is claiming that when they insert a document containing a 10MB binary string the driver ends up using an additional 25MB.
I discussed with behackett and his theory is the extra memory comes from using the buffer.h API in our C extensions. The issue is that when a buffer needs to grow we simply double the size until the buffer is large enough to accommodate the new write. So in this case:
- Client inserts a document containing a 10MB binary string
- pymongo enters the C extensions (_cbson_op_msg) and starts encoding the document to a buffer
- pymongo calls buffer_write() with the 10MB byte-string
- the buffer doubles in size until it reaches ~10-16MB
- pymongo finishes encoding the message to the buffer
- pymongo calls Py_BuildValue to convert the buffer to a Python bytes which creates a copy taking at least an extra 10MB.
- we deallocate the buffer to free ~10-16MB.
So in total the peak memory is around 25MB. We should investigate if it's possible to reduce the memory usage by using a zero-copy method to convert from the internal buffer to a Python bytes object.