Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18447

RLP fails to tokenize Chinese strings with ESC control characters

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor - P4 Minor - P4
    • None
    • 3.1.2
    • Text Search
    • ALL
    • Hide

      var t = db.rlp;
      t.drop();
       
      assert.commandWorked(t.ensureIndex({a: 'text'}));
      assert.eq(t.find({$text: {$search: '\u001b', $language: 'zht'}}).itcount(), 0);
      

      Show
      var t = db.rlp; t.drop();   assert.commandWorked(t.ensureIndex({a: 'text'})); assert.eq(t.find({$text: {$search: '\u001b', $language: 'zht'}}).itcount(), 0);

    Description

      This is a bug in RLP, but it can cause issues for MongoDB users who attempt to query for, or index, Chinese strings with ESC control characters.

      It affects both Traditional Chinese (zht) and Simplified Chinese (zhs) strings.

      > var t = db.rlp;
      > t.drop();
      true
       
      > t.ensureIndex({a: 'text'});
      {
      	"createdCollectionAutomatically" : true,
      	"numIndexesBefore" : 1,
      	"numIndexesAfter" : 2,
      	"ok" : 1
      }
       
      // Traditional Chinese
      > t.find({$text: {$search: '\u001b', $language: 'zht'}});
      Error: error: {
      	"$err" : "Unable to process the document with return code: -10005, and document '\u001b'.",
      	"code" : 28627
      }
       
      // Simplified Chinese
      > t.find({$text: {$search: '\u001b', $language: 'zhs'}});
      Error: error: {
      	"$err" : "Unable to process the document with return code: -10005, and document '\u001b'.",
      	"code" : 28627
      }
      

      Attachments

        Activity

          People

            backlog-server-platform DO NOT USE - Backlog - Platform Team
            kamran.khan Kamran K.
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: