[SERVER-9953] Text search: dutch stemmer not working? Created: 18/Jun/13 Updated: 28/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 2.4.4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Erik Pragt | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qi-text-search, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Integration
|
||||||||
| Participants: | |||||||||
| Description |
|
Hi all, I'm using MongoDB text search, and I'd like to give some feedback. I'm not sure what the best way is to do so, so I've made this report. If there's a more preferred way, please let me know, so I can use that way in the future. Based on this document: http://docs.mongodb.org/manual/tutorial/create-text-index-on-multi-language-collection/, I've made some testcase, and I don't understand what's happening. This is my test data:
And I'm most interested in finding the Dutch results right now. It seems like the stemmer is not working for some words:
Note that the plural for hond ('dog') is honden (dogs) However, MongoDB text search doesn't seem to understand this, and returns nothing. In my opinion, this seems like a bug? |
| Comments |
| Comment by Miguel G [ 16/Nov/13 ] |
|
Hi there I am also having issues with the Spanish text search: The stemmer apparently removes the 'o' at the end of each word (we have quite a few words which end in 'o' so you can see how problematic this is So if I run this query: db.collection.runCommand( "text", { search: "barco", language:"spanish" }) I get the following output, and no results even though there's a field containing the word 'barco' (notice how the 'o' has been removed in the queryDebugString field): { , But if I run the same query but choosing english as language: db.collection.runCommand( "text", { search: "barco", language:"english" }) I get a result (notice that the 'o' has not been removed this time) { Any idea why the 'o' is being removed in spanish? Many thanks |
| Comment by Amalia Hawkins [ 07/Nov/13 ] |
|
Hi, Erik. Sorry for the delay in response! Since this is an issue with the stemmer we use, the solution is dependent on (a) Snowball modifying the stemmer, or (b) MongoDB text search switching to a new stemmer. We will keep it in mind as we move forward with the feature. Thank you for your help! |
| Comment by Daniel Pasette (Inactive) [ 25/Jun/13 ] |
|
Need to check if stemmer issues can be reported upstream to snowball, which is the stemmer used by text search. |