Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.8
Affects Version/s: 2.2.1
Component/s: BSON
Labels:
None

Epic Link:
Flexible BSON Encoder/Decoder
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Prologue

Since I am working on someone else's dime, I need a
justification, why I am spending so much time on something completely different
Hence I must be able to at least say "I tried".

So a simple yes or no is sufficient. You do not have to explain
your decision. Both are just fine with me.

If you say yes – wonderful. If you say no, I have leverage that
my employer might actually buy some support to get this into the
driver (and then some more to take the administrative burden off my
shoulders). That will probably be cheaper than me maintaining a fork of
the pymongo driver As I said, break as much as possible in each new
release, so that I become more expensive than official support!

However, our application still has the need for fast on-the-fly
conversion, which I cannot justify by just saying
the driver does not support it. So here goes:

Judging by your comments, you seem to dislike the names of the callbacks:

This module provides two methods: `object_hook` and `default`. These
names are pretty terrible, but match the names used in Python's `json
library ...

But do you dislike the concept? Because here it comes again (this time
borrowed from pickle, copy and MarkupSafe, the names are
inspired by emacs(1)).

Risks

None. All features have to be explicitely enabled. But on the other
hand – features are always dangerous and terrible things can happen :9.

Using the __bson__ object state hook does not present a problem,
since no other application should be using it.

Feature

When an element cannot be encoded – given that all features are
enabled – the following object methods are tried in order before an
exception is raised:

__bson__(self)
__getstate__(self)

__bson__ may deliver any type of data, __getstate__ is
required to deliver a dict. If that fails, an attempt is made to
retrieve the object's __dict__ attribute for encoding.

Each feature is turned on and off independently with the following
functions:

enable_bson_hook([True|False])
enable_getstate_hook([True|False])
enable_dict_hook([True|False])

The activation status is checked with the following functions:

is_bson_hook_enabled()
is_getstate_hook_enabled()
is_dict_hook_enabled()

The feature is available at
https://github.com/wolfmanx/mongo-python-driver/commit/f61a96805c37e16d5544dd79ec126c1d98ea9550

Benefits

For the driver: None.

For an application:

This feature eliminates the need to copy and hold data. Given that
mongodb already competes with the application for memory, this is a
good thing.

Depending on the structure of the data and the availability of the C
extension, the time savings are significant.

Here is a rough estimate comparison of the best officially supported
method and the __getstate__ feature for encoding 100 mixed object
type documents with 1000 fields each:

Python 2 - C extension

:INF: waste_some_space_and_time_converting_data_with_BSON_encode()
total_time (100/1000) : 0:00:00.524321 factor: 44.63
:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.015852 factor: 1.35

Python 3 - C extension

:INF: waste_some_space_and_time_converting_data_with_BSON_encode()
total_time (100/1000) : 0:00:00.612010 factor: 39.14
:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.020740 factor: 1.33

Python 2 - pure Python

:INF: waste_some_space_and_time_converting_data_with_json_bson_default()
total_time (100/1000) : 0:00:01.231634 factor: 4.57
:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.486110 factor: 1.8

Python 3 - pure Python

:INF: waste_some_space_and_time_converting_data_with_json_bson_default()
total_time (100/1000) : 0:00:00.983371 factor: 5.36
:INF: dont_waste_time_with_getstate_hook_feature()
total_time (100/1000) : 0:00:00.349842 factor: 1.91

Note: The reference time for each separate test run is different.
I.e., the times and factors are not comparable between different
setups.

Disadvantages

This is not the best solution. A default/object_pair_hook
callback pair (like implemented in the json module) would be ideal,
but I don't have the time to implement the necessary API, passing it
down through the entire pymongo DB layer.

With a thread local configuration this solution would be a very good
runner-up.

Caveats

Without a thread local configuration, enabling the __getstate__
and __dict__ features without proper locking can lead to
unexpected results in concurrent threads encoding BSON data.

A documented global module lock would already go a long way.

Epilogue

Please, forgive me, I'm just an old dog, eager to learn a new trick,
but still hanging on to his old tricks, too.

depends on

PYTHON-1750 Support codec callbacks for simple types

Closed

Assignee:: Prashant Mital (Inactive)
Reporter:: Wolfgang Scherer
Votes:: 2 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jul 18 2012 10:28:39 PM UTC
Updated:: Mar 18 2019 07:12:33 PM UTC
Resolved:: Mar 18 2019 07:10:49 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates