[SERVER-13998] Support for language constrained search Created: 20/May/14 Updated: 28/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance, Text Search |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | A Mare | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qi-text-search | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Integration
|
||||||||
| Participants: | |||||||||
| Description |
|
The MongoDB text search functionality became quite flexible in version 2.6 in terms of language specification in documents (and subdocuments). It is also possible (and advisable) to specify the language of the looked-after words when performing the search. What is still missing is the possibility to limit the resulting documents in terms of original language of found stems. If we have a collection holding documents with text-indexed fields in various languages, searching for some words in language A may very well return documents that matched the query through stems collected from words in language B (i.e. with a totally different meaning). There are quite a few cases where language separation is not only advisable, but also required. Example:
the search will produce documents referring to French fries, which is not what we intended. Suggestion:
Of course, this would imply that the text indexes will have to also store the original language information for all collected stems (actually, for all their occurrences). As a side note, this would bring you closer to the Google's web search option to look into pages of a certain language only. Current workaround: |
| Comments |
| Comment by A Mare [ 20/May/14 ] |
|
Well, I do not see any duplication here. SERVER-8988 speaks about different results based on the $language specified along with the search words, which quite normal in my opinion (to be honest, I don't see the point of SERVER-8988). I'm talking here about a completely different issue, which needs indeed language information attached to the indexed stems. |