Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1403

BSON.encode and BSON.decode perform an extra copy by design

    • Type: Icon: Improvement Improvement
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: BSON
    • None

      Since BSON.encode returns a BSON instance and BSON.decode requires a BSON instance, they both do an extra copy of the bytes.

      For example encoding a RawBSONDocument with bson.BSON.encode takes about twice as long compared to bson._dict_to_bson:

      $ python -m timeit -s 'from bson import BSON, DEFAULT_CODEC_OPTIONS, _dict_to_bson; from bson.raw_bson import RawBSONDocument;raw = RawBSONDocument(BSON.encode({"s": "s"*1024*1024*15}))' 'BSON.encode(raw)'
      10 loops, best of 3: 22.8 msec per loop
      $ python -m timeit -s 'from bson import BSON, DEFAULT_CODEC_OPTIONS, _dict_to_bson; from bson.raw_bson import RawBSONDocument;raw = RawBSONDocument(BSON.encode({"s": "s"*1024*1024*15}))' '_dict_to_bson(raw, False, DEFAULT_CODEC_OPTIONS)'
      100 loops, best of 3: 13.8 msec per loop
      

      Perhaps we should add new encode and decode functions to work with bytes as BSON without the extra copy.

            Assignee:
            Unassigned Unassigned
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: