Details
-
Improvement
-
Resolution: Done
-
Minor - P4
-
None
-
4.0.6
-
None
Description
I need to develop partial word search util with CJK(almost Korean letters).
Mongo Text Search doesn't surpport CJK letters, so I make trick.
I seperate original data field and search data field, such as 'title' and 'rawTitle'
When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.
Like this.
// convert
|
String rawTitle = URLEncoder.encode(title, "UTF-8"); |
|
But, alphabet or numbers are same rawTitle and title.
// convert alphabet and numbers
|
String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter); |
If I want to search this,
// search
|
String searchText = "\"" + convertedText + "\""; |
Also need textIndexes
// textIndexes
|
db.collection.createIndex({
|
rawTitle: 'text' |
}, {
|
default_language: 'none' |
});
|
This trick is not good for cli, because if i want to search something, I must convert text and paste it.
But, it's work for me.... even using CJK letters.
If I want to find '가나다', so I write '나다'.
search util will convert '나다' to '%EB%82%98%EB%8B%A4'
and i got result, '가나다'.
When i have 3 document,
// collection
|
{
|
title: 'qweqweqwe' |
}
|
{
|
title: 'qweewqqwe' |
}
|
{
|
title: 'qw eq we' |
}
|
and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'
I think it's useful trick, but i worry about performance issue.
I want to hear your opinion. Thanks.
Attachments
Issue Links
- is related to
-
SERVER-29598 Support Korean language in full text search
-
- Backlog
-