Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29598

Support Korean language in full text search

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Text Search
    • Labels:
      None
    • Case:

      Description

      Add Korean to languages supported in MongoDB FTS.

      Original description:
      First of all, MongoDB support stemming for major language like english.
      But there's no stemming for CJK (Especially I am focusing on Korean). So MongoDB text search is useless for korean language unless stemming Korean in application code.

      I am not sure you are interested in Korean,
      Anyway Korean use only suffix(postpositional word) after stem(base word) like ..

      Stem : 한글
      With suffix : 한글은, 한글이, 한글을, 한글과, 한글도, 한글처럼, ...
      

      But current MongoDB implementation, MongoDB search exact match with search term. So Korean word does not matched because of suffix("은", "는", "이", "가", "처럼", ...)

      So if MongoDB support range search for text search like below example, We (Korean) can use text-search for Korean language.

      Text : "한글은 뛰어난 언어입니다."
      Search term : "한글"
      Range search in Text-search : "한글" <= range < "한긁" 
        (where "한긁" is generated simple increment of last character of search term, [like this|https://github.com/mongodb/mongo/pull/1151/commits/641c3041282746aff280b685424d55926bab93b2#diff-bc6db30f2a5f9618496534d03aeabf54R108])
      

      Of course, this feature is not needed for language which has stemming.
      So I want you add knob to enable or disable this range search for text-search (and default is false). Then we can use text search with this knob=true for Korean language.

      I pushed pull-request for this simple idea to MongoDB github

      This feature will save a lot of Korean guys. Please consider adding this feature seriously.
      (I am not sure this feature is useful for Japanese or China which does not have space in phrase)

      Thanks.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                5 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated: