Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45859

Text Indexes with partial word match or CJK match



    • Type: Improvement
    • Status: Closed
    • Priority: Minor - P4
    • Resolution: Done
    • Affects Version/s: 4.0.6
    • Fix Version/s: None
    • Component/s: Text Search
    • Labels:


      I need to develop partial word search util with CJK(almost Korean letters).

      Mongo Text Search doesn't surpport CJK letters, so I make trick.


      I seperate original data field and search data field, such as 'title' and 'rawTitle'

      When I insert or update title, convert title String value in Spring(JAVA) and set value in rawTitle.


      Like this.

      // convert
      String rawTitle = URLEncoder.encode(title, "UTF-8");

      But, alphabet or numbers are same rawTitle and title.

      // convert alphabet and numbers
      String rawTitle = "$" + Integer.toHexString(alphabetOrNumberCharacter);


      If I want to search this,

      // search
      String searchText = "\"" + convertedText + "\"";

      Also need textIndexes

      // textIndexes
          rawTitle: 'text'
      }, {
          default_language: 'none'

      This trick is not good for cli, because if i want to search something, I must convert text and paste it.

      But, it's work for me.... even using CJK letters.

      If I want to find '가나다', so I write '나다'.

      search util will convert '나다' to '%EB%82%98%EB%8B%A4'

      and i got result, '가나다'. 



      When i have 3 document,

      // collection
          title: 'qweqweqwe'
          title: 'qweewqqwe'
          title: 'qw eq we'


      and I search text, 'eq' in search util, I can get 'qweqweqwe' and 'qw eq we'


      I think it's useful trick, but i worry about performance issue.

      I want to hear your opinion. Thanks.




          Issue Links



              • Votes:
                0 Vote for this issue
                5 Start watching this issue


                • Created: