Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-13175

Text search "Search a Different Language" incomplete and misleading

    XMLWordPrintable

Details

    • Improvement
    • Status: Open
    • Minor - P4
    • Resolution: Unresolved
    • 4.2.0
    • None
    • manual, Server

    Description

      Description

      Regarding this page : https://docs.mongodb.com/manual/reference/operator/query/text/#search-a-different-language

      The has been a lot of misunderstanding around full text search, stemming and diacritics in different languages. I find the documentation misleading and incomplete.

      Using $language in a text search with a different value than the index's default_language can lead to very unexpected results, including not finding exact matches (event without diacritics).

      In most cases you should not specify a different language than the default, unless you know exactly how the stemmers work for each languages used.

      Incidentally, "default_language" is also misleading (but correctly documented) since it not only sets a default language for search queries but also defines once and for all how text is indexed.

      A simple example:

       

      db.test.insert({t: "bats"})
      db.test.createIndex({t:"text"})
      db.test.countDocuments({"$text":{"$search":"bats", "$language":"none"}});
      -> 0
      

       

       

      A more thorough example: the value indexed is "passés". ** We search for various terms using different index/search languages. The outputs are quite unpredictable except for the fremch/french and none/none cases:

       

      index default_language search $language "pass"** "passe" "passé" "passes" "passés" "passées"
      english  english     found   found found
      english french            
      english none   found found      
      french english found found   found    
      french french found found found found found found
      french none found          
      none english            
      none french            
      none none       found found  

       

       

      Moreover, in this page, The given example has no value since it does not use stemming nor diacritics, and would yield the same result with no $language specified.

      > db.articles.find( { $text: { $search: "leche", $language: "es" } } )
       
      { "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 }
      { "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 }
       
      > db.articles.find( { $text: { $search: "leche" } } ) 
       
      { "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 }
      { "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 } 
      

       

       

       

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

      Attachments

        Activity

          People

            Unassigned Unassigned
            tech@fractal-it.fr Mandel Mandel
            Ravind Kumar Ravind Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              2 years, 35 weeks, 4 days ago