[SERVER-20256] Korean Language Created: 02/Sep/15 Updated: 19/Nov/15 Resolved: 19/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siamak | Assignee: | Kelsey Schubert |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Dear Sir/Madam. I know that you fix this problem, but i have problem with the unicode for Korean Language, I am trying to import a Korean wikipedia corpus to mongodb in Linunx, but when i want to search a word in mongodb through my java application, It could not find any match word, what i have to do? i tried to convert the corpus to utf-8 in my query and in mongo, but the results were same. |
| Comments |
| Comment by Kelsey Schubert [ 19/Nov/15 ] | ||||||||||||||
|
Hi 30yamak, Sorry for the long delay getting back to you. I have imported your data and successfully queried the term field. Please see the examples below:
It's worth noting that some fonts may render two symbols as single character. Depending on your font these two symbols may appear the same 지 지. However, one of these characters has two unicodes, whereas the other has only one unique unicode. The unicodes in the document must match the query. I am closing this ticket since we can't reproduce this issue. If you can share a run-able reproduction script, preferably in javascript, we'll be happy take another look. Thank you, | ||||||||||||||
| Comment by Siamak [ 25/Sep/15 ] | ||||||||||||||
|
No new news? | ||||||||||||||
| Comment by Siamak [ 17/Sep/15 ] | ||||||||||||||
|
Dear Sam, I attached mongodump in my Dropbox. Because its size was more that allowance. https://dl.dropboxusercontent.com/u/6149013/terms.bson.gz With Best Wishes, | ||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 16/Sep/15 ] | ||||||||||||||
|
Can you provide some of your data in the form of a mongodump .bson file? This will allow me to try my reproduction with your data. Regards, | ||||||||||||||
| Comment by Siamak [ 14/Sep/15 ] | ||||||||||||||
|
Dear Sam, Thank you for your helping, I attache my data in mongo | ||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 11/Sep/15 ] | ||||||||||||||
|
Sorry for not getting back to sooner. I've been trying to reproduce this issue with the mongo shell, without luck. You can see my attempt to translate your example here:
I have some more questions about your issue:
Thanks again for your help. Regards, | ||||||||||||||
| Comment by Siamak [ 02/Sep/15 ] | ||||||||||||||
|
I also tried to find a specific string through \uxxx but the return value was NULL again : | ||||||||||||||
| Comment by Siamak [ 02/Sep/15 ] | ||||||||||||||
|
favorite It is my code that i convert the string to utf-8 when i insert my data to mongodb and find from mongodb //For Insert to Mongodb: //To find From Mongodb |