[SERVER-26658] Full Text returns wrong results for Turkish Created: 17/Oct/16 Updated: 27/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 3.2.10, 3.3.15, 3.4.0-rc0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kemal Ogun Isik | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 4 |
| Labels: | qi-text-search, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Integration
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The text index version 3 does not provide correct search result when the word contains "Turkish i" char.
|
| Comments |
| Comment by Kyle Suarez [ 15/May/17 ] |
|
Hello ogunisik, Apologies for the delay, and for the frustration with the Turkish diacritic bug. The bug is rather tricky and has no obvious solution, and as I mentioned in my previous comment, it may require a significant rewrite of our codepoint-to-codepoint transformation algorithm. In addition, if that turns out to be required, we may have to bump the text index version. Both of these points have led us to put this ticket on the Backlog, meaning that it is currently not scheduled to be worked on. Sorry again for the inconvenience. Regards, |
| Comment by Kemal Ogun Isik [ 07/May/17 ] |
|
Hello, Is there any plan to solve this issue? We are still waiting.... |
| Comment by Kyle Suarez [ 07/Dec/16 ] |
|
I don't know if I've pinpointed the exact underlying cause, but I have some suspicions. unicode::codepointToLower() has special behavior for CaseFoldMode::kTurkish, but unicode::codepointRemoveDiacritics() isn't Turkish-aware. I would assume that, in a Turkish diacritic-insensitive setting, i maps to ı; that is, 0x69 maps to 0x131. However, 0x69 isn't handled as a case in the giant switch statement. An offline chat with redbeard0531 implied that our codepoint-to-codepoint transformation algorithm can't handle the Turkish i properly, but I'm not sure exactly where to look further. |
| Comment by Ramon Fernandez Marina [ 17/Oct/16 ] |
|
Thanks for your report ogunisik, the Query team is going to look into this issue. |