Details
-
Bug
-
Status: Closed
-
Trivial - P5
-
Resolution: Fixed
-
None
-
None
-
Minor Change
Description
PyMongo allows inserting invalid utf-8 via a Regex instance. It then fails to decode the resulting document (without overriding unicode_decode_error_handler):
>>> has_c()
|
False
|
>>> b'\xed\xbc\xad'.decode('utf-8') |
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module> |
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte |
>>> from bson import encode, decode |
>>> data = encode({'a':Regex(b'\xed\xbc\xad','')}) |
b'\r\x00\x00\x00\x0ba\x00\xed\xbc\xad\x00\x00\x00' |
>>> decode(data)
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module> |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 903, in decode |
return _bson_to_dict(data, codec_options) |
bson.errors.InvalidBSON: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte |
>>> decode(data, codec_options=CodecOptions(unicode_decode_error_handler='replace')) |
{'a': Regex('���', 0)} |
The same case without the C extensions does properly raise an error:
>>> encode({'a':Regex(b'\xed\xbc\xad')}) |
Traceback (most recent call last):
|
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 479, in _make_c_string_check |
_utf_8_decode(string, None, True) |
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte |
|
During handling of the above exception, another exception occurred:
|
|
Traceback (most recent call last):
|
File "<stdin>", line 1, in <module> |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 871, in encode |
return _dict_to_bson(document, check_keys, codec_options) |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 808, in _dict_to_bson |
elements.append(_element_to_bson(key, value,
|
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 794, in _element_to_bson |
return _name_value_to_bson(name, value, check_keys, opts) |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 736, in _name_value_to_bson |
return _ENCODERS[type(value)](name, value, check_keys, opts) |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 610, in _encode_regex |
return b"\x0B" + name + _make_c_string_check(value.pattern) + b"\x00" |
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 482, in _make_c_string_check |
raise InvalidStringData("strings in documents must be valid " |
bson.errors.InvalidStringData: strings in documents must be valid UTF-8: b'\xed\xbc\xad' |
We should fix the C extensions.
Edit: I had a copy past error in my PyPy example. PyPy works the same as CPython without C extensions.