-
Type: Task
-
Resolution: Won't Do
-
Priority: Minor - P4
-
Affects Version/s: 4.2.0
-
Labels:
Description
Regarding this page : https://docs.mongodb.com/manual/reference/operator/query/text/#search-a-different-language
The has been a lot of misunderstanding around full text search, stemming and diacritics in different languages. I find the documentation misleading and incomplete.
Using $language in a text search with a different value than the index's default_language can lead to very unexpected results, including not finding exact matches (event without diacritics).
In most cases you should not specify a different language than the default, unless you know exactly how the stemmers work for each languages used.
Incidentally, "default_language" is also misleading (but correctly documented) since it not only sets a default language for search queries but also defines once and for all how text is indexed.
A simple example:
db.test.insert({t: "bats"}) db.test.createIndex({t:"text"}) db.test.countDocuments({"$text":{"$search":"bats", "$language":"none"}}); -> 0
A more thorough example: the value indexed is "passés". ** We search for various terms using different index/search languages. The outputs are quite unpredictable except for the fremch/french and none/none cases:
index default_language | search $language | "pass"** | "passe" | "passé" | "passes" | "passés" | "passées" |
---|---|---|---|---|---|---|---|
english | english | found | found | found | |||
english | french | ||||||
english | none | found | found | ||||
french | english | found | found | found | |||
french | french | found | found | found | found | found | found |
french | none | found | |||||
none | english | ||||||
none | french | ||||||
none | none | found | found |
Moreover, in this page, The given example has no value since it does not use stemming nor diacritics, and would yield the same result with no $language specified.
> db.articles.find( { $text: { $search: "leche", $language: "es" } } ) { "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 } { "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 } > db.articles.find( { $text: { $search: "leche" } } ) { "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 } { "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 }