[SERVER-13238] Text search in german language doesn't find 'simple' words Created: 17/Mar/14  Updated: 10/Dec/14  Resolved: 17/Mar/14

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.6.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mike Toggweiler Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

The following example shows some cases:

db.GermanStem.insert({'text':'mieter', 'language':'german'})
db.GermanStem.ensureIndex(                            {                              text: "text"  }                          )
 
db.GermanStem.runCommand('text', {search:'mieter'})
 
=> Returns no result
 
db.GermanStem.runCommand('text', {search:'miete'})
 
=> Returns result
 
---
Other example:
 
db.GermanStem.insert({'text':'Verlängerung', 'language':'german'})
db.GermanStem.runCommand('text', {search:'Verlängerung'})

Participants:

 Description   

Text search in german language doesn't always find 'simple' words.



 Comments   
Comment by J Rassi [ 26/Mar/14 ]

> My problem is, that we don't know the search language for sure so the user can search in german, english or french documents at the same time.

This is not currently supported. The search language specifies how the search string is parsed (and conversely, the document language specifies how the documents are parsed). There is no existing functionality that automatically detects the search string language or document language; your application has to specify the language for each search string and for each document.

Comment by Mike Toggweiler [ 18/Mar/14 ]

@Jason:
Thank you for the quick anwer. Accordingly to the "Multi Language Support" in a single collection the language can get specified on the collection itself: http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/

So I thought that this property would be enough to match the search language against the document language. My problem is, that we don't know the search language for sure so the user can search in german, english or french documents at the same time.

Isn't there another possibility to match the search language per search result?

Comment by J Rassi [ 17/Mar/14 ]

The issue is that your search string is being parsed in English, not German.

To perform your search in German, specify the "language" option for the search, or set the default language for the index (which is used as the default value for searches/documents that have no language specified). See the documentation for the text command: <http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text>.

Example with "default_language" index option:

> db.GermanStem.insert({text: 'mieter'})
> db.GermanStem.ensureIndex({text: 'text'}, {default_language: 'german'})
> db.GermanStem.runCommand('text', {search: 'mieter'}).stats.n
1

Example with "language" option to text command:

> db.GermanStem.insert({text: 'mieter', language: 'german'})
> db.GermanStem.ensureIndex({text: 'text'})
> db.GermanStem.runCommand('text', {search: 'mieter', language: 'german'}).stats.n
1

Generated at Thu Feb 08 03:31:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.