Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-1697

Looping over Pymongo cursor returns bson.errors.InvalidBSON error after some iterations

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: BSON
    • None
    • Environment:
      python2.7

      one table in mongo db has 800W records,

      i use 

      result = collection.find()

      for record in result:

              print record

      when it run more than 50 minutes, it will occur error as below

      tart query data
      2018-12-04 23:28:19,875 - logger.py: 37 - ERROR - Exception when loading data: SON([(u'uid', u'336543dfafd443d0872b48cda0e13333'), (u'platformCode', u'xxxx'), (u'loginPlatformCode', u'xxx'), (u'createTime', u'2018-10-15 10:58:25'), (u'eventName', u'\u70b9\u51fb\u753b\u7b14\u56fe\u6807'), (u'simpleName', xxx'), (u'sn', u'xxxx'), (u'courseNum', u'aaaa'), (u'_id', \{'$oid': '5bc4028d10d57c78d9658455'}), (u'type', u'1'), (u'packageName', u'xxxx')])
      'utf8' codec can't decode byte 0xce in position 29: invalid continuation byte
      Traceback (most recent call last):
      File "/schedule.wordir/etl/data_loader.py", line 45, in run
      for record in input.extract():
      File "/schedule.wordir/etl/input/data_input_mongo_full.py", line 56, in extract
      for record in result:
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1169, in next
      if len(self.__data) or self._refresh():
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1106, in _refresh
      self.__send_message(g)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 971, in __send_message
      codec_options=self.__codec_options)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1055, in _unpack_response
      return response.unpack_response(cursor_id, codec_options)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/message.py", line 945, in unpack_response
      return bson.decode_all(self.documents, codec_options)
      InvalidBSON: 'utf8' codec can't decode byte 0xce in position 29: invalid continuation byte

       

       

       

      but  i just find out the special record (u'_id', {'$oid': '5bc4028d10d57c78d9658455'}) ,do as same code , it is ok...

       

      i find one same question on stackoverflow: 

      https://stackoverflow.com/questions/51734811/looping-over-pymongo-cursor-returns-bson-errors-invalidbson-error-after-some-ite

       

       

            Assignee:
            bernie@mongodb.com Bernie Hackett
            Reporter:
            xhmz xhmz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: