[SERVER-17165] Full Text returns wrong results for Turkish Created: 03/Feb/15  Updated: 03/Feb/15  Resolved: 03/Feb/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.0-rc7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark Benvenuto Assignee: Matt Kangas
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-8423 Text search case folding needs utf-8 ... Closed
Operating System: ALL
Participants:

 Description   

When we lower case Turkish words, we use English rules to lower case the words instead of Turkish rules. In Turkish, the lower case form of "I" is "ı" not "i" (Note the lack of dot in the font glyph).

Load Script:

db.turk.drop()
 
db.turk.insert({ _id: "small_dotless", t1 : "quıt" })
db.turk.insert({ _id: "small_dot", t1 : "quit" })
db.turk.insert({ _id: "big_dotless", t1 : "QUIT" })
db.turk.insert({ _id: "big_dot", t1 : "QUİT" })
 
db.turk.ensureIndex( { t1 : "text"} , {default_language : "turkish" })

Actual Results:

> db.turk.find( {$text: {$search: "quit" }})
{ "_id" : "small_dot", "t1" : "quit" }
{ "_id" : "big_dotless", "t1" : "QUIT" }

Expected Results:

> db.turk.find( {$text: {$search: "quit" }})
{ "_id" : "small_dot", "t1" : "quit" }
{ "_id" : "big_dot", "t1" : "QUİT" }



 Comments   
Comment by Matt Kangas [ 03/Feb/15 ]

Duplicates SERVER-8423

Generated at Thu Feb 08 03:43:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.