Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-2716

Optimize memory usage when encoding messages

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Reported here: https://developer.mongodb.com/community/forums/t/high-memory-usage-while-inserting-binary-file/106679/3

      The reporter is claiming that when they insert a document containing a 10MB binary string the driver ends up using an additional 25MB.

      I discussed with behackett and his theory is the extra memory comes from using the buffer.h API in our C extensions. The issue is that when a buffer needs to grow we simply double the size until the buffer is large enough to accommodate the new write. So in this case:

      • Client inserts a document containing a 10MB binary string
      • pymongo enters the C extensions (_cbson_op_msg) and starts encoding the document to a buffer
      • pymongo calls buffer_write() with the 10MB byte-string
      • the buffer doubles in size until it reaches ~10-16MB
      • pymongo finishes encoding the message to the buffer
      • pymongo calls Py_BuildValue to convert the buffer to a Python bytes which creates a copy taking at least an extra 10MB.
      • we deallocate the buffer to free ~10-16MB.

      So in total the peak memory is around 25MB. We should investigate if it's possible to reduce the memory usage by using a zero-copy method to convert from the internal buffer to a Python bytes object.

            Assignee:
            Unassigned Unassigned
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: