[SERVER-45859] Text Indexes with partial word match or CJK match Created: 30/Jan/20  Updated: 31/Jan/20  Resolved: 31/Jan/20

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 4.0.6
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Karen Takahashi Assignee: Carl Champain (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-29598 Support Korean language in full text ... Backlog
Participants:

 Description   

I need to develop partial word search util with CJK(almost Korean letters).

Mongo Text Search doesn't surpport CJK letters, so I make trick.

 

I seperate original data field and search data field, such as 'title' and 'rawTitle'

When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.

 

Like this.

// convert
String rawTitle = URLEncoder.encode(title, "UTF-8");

But, alphabet or numbers are same rawTitle and title.

// convert alphabet and numbers
String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter);

 

If I want to search this,

// search
String searchText = "\"" + convertedText + "\"";

Also need textIndexes

// textIndexes
db.collection.createIndex({
    rawTitle: 'text'
}, {
    default_language: 'none'
});

This trick is not good for cli, because if i want to search something, I must convert text and paste it.

But, it's work for me.... even using CJK letters.

If I want to find '가나다', so I write '나다'.

search util will convert '나다' to '%EB%82%98%EB%8B%A4'

and i got result, '가나다'. 

 

 

When i have 3 document,

// collection
{
    title: 'qweqweqwe'
}
{
    title: 'qweewqqwe'
}
{
    title: 'qw eq we'
}

 

and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'

 

I think it's useful trick, but i worry about performance issue.

I want to hear your opinion. Thanks.

 

 



 Comments   
Comment by Carl Champain (Inactive) [ 31/Jan/20 ]

Hi signal.be@gmail.com,

The SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Kind regards,
Carl
 

Generated at Thu Feb 08 05:09:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.