[DOCS-13175] Text search "Search a Different Language" incomplete and misleading Created: 28/Oct/19  Updated: 30/Oct/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: 4.2.0
Fix Version/s: Server_Docs_20231030

Type: Improvement Priority: Minor - P4
Reporter: Mandel Mandel Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: docs-query, text_index
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 1 year, 14 weeks, 2 days ago
Epic Link: DOCSP-1769

 Description   

Description

Regarding this page : https://docs.mongodb.com/manual/reference/operator/query/text/#search-a-different-language

The has been a lot of misunderstanding around full text search, stemming and diacritics in different languages. I find the documentation misleading and incomplete.

Using $language in a text search with a different value than the index's default_language can lead to very unexpected results, including not finding exact matches (event without diacritics).

In most cases you should not specify a different language than the default, unless you know exactly how the stemmers work for each languages used.

Incidentally, "default_language" is also misleading (but correctly documented) since it not only sets a default language for search queries but also defines once and for all how text is indexed.

A simple example:

 

db.test.insert({t: "bats"})
db.test.createIndex({t:"text"})
db.test.countDocuments({"$text":{"$search":"bats", "$language":"none"}});
-> 0

 

 

A more thorough example: the value indexed is "passés". ** We search for various terms using different index/search languages. The outputs are quite unpredictable except for the fremch/french and none/none cases:

 

index default_language search $language "pass"** "passe" "passé" "passes" "passés" "passées"
english  english     found   found found
english french            
english none   found found      
french english found found   found    
french french found found found found found found
french none found          
none english            
none french            
none none       found found  

 

 

Moreover, in this page, The given example has no value since it does not use stemming nor diacritics, and would yield the same result with no $language specified.

> db.articles.find( { $text: { $search: "leche", $language: "es" } } )
 
{ "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 }
{ "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 }
 
> db.articles.find( { $text: { $search: "leche" } } ) 
 
{ "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 }
{ "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 } 

 

 

 

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Education Bot [ 31/Oct/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Generated at Thu Feb 08 08:07:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.