Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-668

Why is BSON decoding UTF-8 with "strict" mode?

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 2.7
    • Component/s: None
    • None
    • Environment:
      All
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Hi there,

      I'm currently experiencing some issues with a MongoDB collection containing invalid UTF-8 strings. My python code enumerating the collection with a cursor is currently throwing UnicodeDecodeError.

      File "build/bdist.macosx-10.9-x86_64/egg/mongo_connector/oplog_manager.py", line 354, in docs_to_dump
          for doc in cursor:
        File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 904, in next
          if len(self.__data) or self._refresh():
        File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 865, in _refresh
          limit, self.__id))
        File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 800, in __send_message
          self.__uuid_subtype)
        File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymongo/helpers.py", line 107, in _unpack_response
          as_class, tz_aware, uuid_subtype)
      UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 11: invalid continuation byte
      

      After a few minutes of investigation, it seems that it's related to the "strict" mode of all UTF8Decoding performed in the C layer of the BSON library (https://github.com/mongodb/mongo-python-driver/blob/master/bson/_cbsonmodule.c). Any chance "ignore" could be used instead? For example the mongoexport tool skips the failing characters.

      Please let me know what you think.

            Assignee:
            bernie@mongodb.com Bernie Hackett
            Reporter:
            sylvainutard Sylvain Utard
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              None
              None
              None
              None