BSON.encode and BSON.decode perform an extra copy by design

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Works as Designed
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: BSON
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Since BSON.encode returns a BSON instance and BSON.decode requires a BSON instance, they both do an extra copy of the bytes.

      For example encoding a RawBSONDocument with bson.BSON.encode takes about twice as long compared to bson._dict_to_bson:

      $ python -m timeit -s 'from bson import BSON, DEFAULT_CODEC_OPTIONS, _dict_to_bson; from bson.raw_bson import RawBSONDocument;raw = RawBSONDocument(BSON.encode({"s": "s"*1024*1024*15}))' 'BSON.encode(raw)'
      10 loops, best of 3: 22.8 msec per loop
      $ python -m timeit -s 'from bson import BSON, DEFAULT_CODEC_OPTIONS, _dict_to_bson; from bson.raw_bson import RawBSONDocument;raw = RawBSONDocument(BSON.encode({"s": "s"*1024*1024*15}))' '_dict_to_bson(raw, False, DEFAULT_CODEC_OPTIONS)'
      100 loops, best of 3: 13.8 msec per loop
      

      Perhaps we should add new encode and decode functions to work with bytes as BSON without the extra copy.

              Assignee:
              Unassigned
              Reporter:
              Shane Harvey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: