Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-381

BSON should support some type of object state callback

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.2.1
    • Fix Version/s: 3.8
    • Component/s: BSON
    • Labels:
      None
    • Sprint:
      Python Sprint 40, Python Sprint 41, Python Sprint 42

      Description

      Prologue

      Since I am working on someone else's dime, I need a
      justification, why I am spending so much time on something completely different
      Hence I must be able to at least say "I tried".

      So a simple yes or no is sufficient. You do not have to explain
      your decision. Both are just fine with me.

      If you say yes – wonderful. If you say no, I have leverage that
      my employer might actually buy some support to get this into the
      driver (and then some more to take the administrative burden off my
      shoulders). That will probably be cheaper than me maintaining a fork of
      the pymongo driver As I said, break as much as possible in each new
      release, so that I become more expensive than official support!

      However, our application still has the need for fast on-the-fly
      conversion, which I cannot justify by just saying
      the driver does not support it. So here goes:

      Judging by your comments, you seem to dislike the names of the callbacks:

      This module provides two methods: `object_hook` and `default`. These
      names are pretty terrible, but match the names used in Python's `json
      library ...

      But do you dislike the concept? Because here it comes again (this time
      borrowed from pickle, copy and MarkupSafe, the names are
      inspired by emacs(1)).

      Risks

      None. All features have to be explicitely enabled. But on the other
      hand – features are always dangerous and terrible things can happen :9.

      Using the __bson__ object state hook does not present a problem,
      since no other application should be using it.

      Feature

      When an element cannot be encoded – given that all features are
      enabled – the following object methods are tried in order before an
      exception is raised:

      • __bson__(self)
      • __getstate__(self)

      __bson__ may deliver any type of data, __getstate__ is
      required to deliver a dict. If that fails, an attempt is made to
      retrieve the object's __dict__ attribute for encoding.

      Each feature is turned on and off independently with the following
      functions:

      • enable_bson_hook([True|False])
      • enable_getstate_hook([True|False])
      • enable_dict_hook([True|False])

      The activation status is checked with the following functions:

      • is_bson_hook_enabled()
      • is_getstate_hook_enabled()
      • is_dict_hook_enabled()

      The feature is available at
      https://github.com/wolfmanx/mongo-python-driver/commit/f61a96805c37e16d5544dd79ec126c1d98ea9550

      Benefits

      For the driver: None.

      For an application:

      This feature eliminates the need to copy and hold data. Given that
      mongodb already competes with the application for memory, this is a
      good thing.

      Depending on the structure of the data and the availability of the C
      extension, the time savings are significant.

      Here is a rough estimate comparison of the best officially supported
      method and the __getstate__ feature for encoding 100 mixed object
      type documents with 1000 fields each:

      Python 2 - C extension

      1. :INF: waste_some_space_and_time_converting_data_with_BSON_encode()
        total_time (100/1000) : 0:00:00.524321 factor: 44.63
      2. :INF: dont_waste_time_with_getstate_hook_feature()
        total_time (100/1000) : 0:00:00.015852 factor: 1.35

      Python 3 - C extension

      1. :INF: waste_some_space_and_time_converting_data_with_BSON_encode()
        total_time (100/1000) : 0:00:00.612010 factor: 39.14
      2. :INF: dont_waste_time_with_getstate_hook_feature()
        total_time (100/1000) : 0:00:00.020740 factor: 1.33

      Python 2 - pure Python

      1. :INF: waste_some_space_and_time_converting_data_with_json_bson_default()
        total_time (100/1000) : 0:00:01.231634 factor: 4.57
      2. :INF: dont_waste_time_with_getstate_hook_feature()
        total_time (100/1000) : 0:00:00.486110 factor: 1.8

      Python 3 - pure Python

      1. :INF: waste_some_space_and_time_converting_data_with_json_bson_default()
        total_time (100/1000) : 0:00:00.983371 factor: 5.36
      2. :INF: dont_waste_time_with_getstate_hook_feature()
        total_time (100/1000) : 0:00:00.349842 factor: 1.91

      Note: The reference time for each separate test run is different.
      I.e., the times and factors are not comparable between different
      setups.

      Disadvantages

      This is not the best solution. A default/object_pair_hook
      callback pair (like implemented in the json module) would be ideal,
      but I don't have the time to implement the necessary API, passing it
      down through the entire pymongo DB layer.

      With a thread local configuration this solution would be a very good
      runner-up.

      Caveats

      Without a thread local configuration, enabling the __getstate__
      and __dict__ features without proper locking can lead to
      unexpected results in concurrent threads encoding BSON data.

      A documented global module lock would already go a long way.

      Epilogue

      Please, forgive me, I'm just an old dog, eager to learn a new trick,
      but still hanging on to his old tricks, too.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: