Discovered in SERVER-38313 and reported by jesse over email:
bson.BSON.encode() is a classmethod, so you're supposed to write: bson.BSON.encode(b'...')
But bson.BSON.decode() is an instance method, you're supposed to write: bson.BSON(b'...').decode()
This asymmetry between two opposite methods invites this mistake:
bson.BSON.decode(b'...')
The code looks right but in fact it's weird: it's calling an instance method and explicitly passing a bytes object as "self". Bizarrely, this works in Python 3, but in Python 2 it raises "TypeError: unbound method decode() must be called with BSON instance as first argument (got str instance instead)". Of course, in Python 2 bytes and str are the same, but it's surprising that Python 2 requires "self" to be a BSON instance, whereas Python 3 allows it to be a superclass of BSON.
This is especially confusing because our docs use the classmethod invocation of decode:
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON.decode(data)
Personally I think we should move away from BSON class altogether because it results in the unavoidable memory copies described in PYTHON-1403. Instead, we can represent encoded bson documents as bytes (str in Python 2) and add two functions bson.encode(document, ...) and bson.decode(bson_data, ...). This would avoid the problem in PYTHON-1403 and fix the confusing asymmetry in the current API.
For now let's just fix the docs to use BSON(data).decode() instead of BSON.decode(data).
- is related to
-
PYTHON-1785 Provide utility encode and decode methods in BSON module
- Closed
- related to
-
SERVER-38313 Update mongo pretty printer to be compatible with more versions of python
- Closed
-
PYTHON-1403 BSON.encode and BSON.decode perform an extra copy by design
- Closed