Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1284

Instantiating many small RawBSONDocuments is inefficient

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.5
    • Affects Version/s: 3.4
    • Component/s: None
    • Labels:
      None
    • Minor Change

      Starting in 3.4 we can retrieve raw BSON documents from a cursor:

      options = CodecOptions(document_class=RawBSONDocument)
      for doc in db.get_collection("collection", codec_options=options).find():
          # doc is a RawBSONDocument.
          doc.raw
      

      This is mainly intended for speed in libraries like Python-BSONJS that can decode raw BSON themselves. If the documents are small, however, each instantiation of a RawBSONDocument is much more costly than decoding the BSON to a dict would have been. The main penalty is from copying the CodecOptions class for each RawBSONDocument.

      To avoid copying, we will have to require that the CodecOptions passed to a RawBSONDocument has document_class of RawBSONDocument. In normal usage like the example above, that is always true. But, there are direct usages of RawBSONDocument that will be prohibited now:

      # Now prohibited, document_class defaults to dict.
      document = RawBSONDocument(
          bson_string,
          codec_options=CodecOptions(uuid_representation=JAVA_LEGACY))
      
      # Do this instead.
      document = RawBSONDocument(
          bson_string,
          codec_options=CodecOptions(uuid_representation=JAVA_LEGACY,
                                     document_class=RawBSONDocument))
      

            Assignee:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Reporter:
            jesse@mongodb.com A. Jesse Jiryu Davis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: