[JAVA-1791] Full text searching a Turkish word using mongodb-java-driver does not work Created: 01/May/15 Updated: 11/Sep/19 Resolved: 01/May/15 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | None |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Hakan Özler | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
My System settings: |
||
| Description |
|
I have an index on the text field with its default language which is Turkish. When I query in the mongo shell I get the total number which is 17 using the following script:
When attempting to use the same query int the same db in Java using MongoDB Java Driver, I get 0 result. Here is the code snippet that I use:
I found that this only happens because of the special turkish characters when they are included in a word, here are the turkish characters that we use day in and day out: " ı, ç, ü, ö, ş, ğ ". But when I type a word that does not contain any of them, let's say, "hafta" (eng: "week") I get the same result in both mongo shell and java. |
| Comments |
| Comment by Jeffrey Yemin [ 01/May/15 ] | ||||||||||||||||||
|
No worries. Glad we were able to work through it. | ||||||||||||||||||
| Comment by Hakan Özler [ 01/May/15 ] | ||||||||||||||||||
|
Thanks Jeff, I just realise, sorry for taking your time. | ||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/May/15 ] | ||||||||||||||||||
|
In IntelliJ preferences, please try configuring Editor->File Encodings to UTF-8 for your project. | ||||||||||||||||||
| Comment by Hakan Özler [ 01/May/15 ] | ||||||||||||||||||
|
I actually run the code on Intellij, and when I look the commands that it uses for the file, I see this one " -Dfile.encoding=windows-1254" | ||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/May/15 ] | ||||||||||||||||||
|
Try a character-by-character comparison of "maç" and "ma\u00e7", as this looks like a character encoding issue during compilation. Are you setting the character encoding of your source file with the -encoding option on javac? | ||||||||||||||||||
| Comment by Hakan Özler [ 01/May/15 ] | ||||||||||||||||||
|
Hi, I am now getting the result when specifying 'ç' as unicode character. But the last statement returns with NullPointerException. | ||||||||||||||||||
| Comment by Jeffrey Yemin [ 01/May/15 ] | ||||||||||||||||||
|
I'm not able to reproduce this with the following test program:
which outputs the following:
Can you reproduce these results? |