[SERVER-18447] RLP fails to tokenize Chinese strings with ESC control characters Created: 12/May/15  Updated: 11/Oct/18  Resolved: 11/Oct/18

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.1.2
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Kamran K. Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Done Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

var t = db.rlp;
t.drop();
 
assert.commandWorked(t.ensureIndex({a: 'text'}));
assert.eq(t.find({$text: {$search: '\u001b', $language: 'zht'}}).itcount(), 0);

Participants:

 Description   

This is a bug in RLP, but it can cause issues for MongoDB users who attempt to query for, or index, Chinese strings with ESC control characters.

It affects both Traditional Chinese (zht) and Simplified Chinese (zhs) strings.

> var t = db.rlp;
> t.drop();
true
 
> t.ensureIndex({a: 'text'});
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
 
// Traditional Chinese
> t.find({$text: {$search: '\u001b', $language: 'zht'}});
Error: error: {
	"$err" : "Unable to process the document with return code: -10005, and document '\u001b'.",
	"code" : 28627
}
 
// Simplified Chinese
> t.find({$text: {$search: '\u001b', $language: 'zhs'}});
Error: error: {
	"$err" : "Unable to process the document with return code: -10005, and document '\u001b'.",
	"code" : 28627
}



 Comments   
Comment by Mark Benvenuto [ 11/Oct/18 ]

RLP support has been removed from the product.

Generated at Thu Feb 08 03:47:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.