Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24007

Server can return invalid UTF8 for error messages due to truncation in the middle of a code point

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Internal Code
    • Environment:
      ubuntu 14.04 / AWS EC2
    • Query Execution
    • ALL
    • Platforms 15 (06/03/16)

      with unique option index + 'korean' content
      driver occur error when insert duplicate content

      see below


      cswcsy@niklane-Samsung-Ubuntu:~/crawlers/CrawlerPlatform/utils$ python mongo_test.py
      <pymongo.results.InsertOneResult object at 0x7fba9819e500>
      cswcsy@niklane-Samsung-Ubuntu:~/crawlers/CrawlerPlatform/utils$ python mongo_test.py
      Traceback (most recent call last):
      File "mongo_test.py", line 24, in <module>
      result = col.insert_one(script)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 625, in insert_one
      bypass_doc_val=bypass_document_validation),
      File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 530, in _insert
      check_keys, manipulate, write_concern, op_id, bypass_doc_val)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 512, in _insert_one
      check_keys=check_keys)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 218, in command
      self._raise_connection_failure(error)
      File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 346, in _raise_connection_failure
      raise error
      bson.errors.InvalidBSON: 'utf8' codec can't decode byte 0xeb in position 230: invalid continuation byte
      cswcsy@niklane-Samsung-Ubuntu:~/crawlers/CrawlerPlatform/utils$


      like above, it reproduced 100% when i insert twice time
      in unique korean field.
      it didn't reproduce when i use another content(korean)

      here is my test code


            • coding: utf-8 *
              from pprint import pprint
              from pymongo import ReplaceOne
              from pymongo import InsertOne
              import pymongo
              from pymongo import MongoClient
              from utils.mongomanager import MongoManager
              from pymongo.errors import BulkWriteError

      mongo = MongoClient('localhost', 27017)
      db = mongo['bigdata']
      col = db['test']

      script =

      {'brand_name': u'\ub77c\uc628', 'category0': u'\uc0dd\ud65c/\uac74\uac15', 'category1': u'\uacf5\uad6c', 'category2': u'\ubaa9\uacf5\uacf5\uad6c', 'category3': u'\ub300\ud328', 'entity': [], 'price': 9300, 'title': u'\uad6c \uad6d\uc0b0 \ub300\ud328 \uc190\ub300\ud328 \ubaa9\uacf5\uacf5\uad6c \ubbf8\ub2c8\ub300\ud328 \ubaa8\uc11c\ub9ac\ub300\ud328 \ub300\ud328\ub0a0 \ubaa9\uacf5\uad6c \uc804\ub3d9\ub300\ud328 \ubaa9\uc218\uacf5\uad6c \ubaa9\uacf5\uc608 \ud648\ub300\ud328 DIY\uacf5\uad6c \ud3c9\uba74 \ub2e4\ub4ec\uae30'}

      result = col.insert_one(script)
      pprint(result)


      if need something more information or has some solution with this issue, plz reply me.

      thanks a lot

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            cswcsy Sunook Choi
            Votes:
            1 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated: