[SERVER-17535] FTS doesn't match full words when doing filtering on the set found from index scan Created: 11/Mar/15 Updated: 28/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Henrik Ingo (Inactive) | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | qi-text-search, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||
| Assigned Teams: |
Query Integration
|
||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||
| Participants: | |||||||||||||||||||
| Description |
|
The following behavior with FTS seems inconsistent:
What happens above:
It seems to me that when scanning the index, MongoDB will match for full words (post stemming). This is expected. However, for documents found from the index scan a filtering step is executed, which actually matches parts of words. Without looking at the code, I recognize that this is a common error when using regular expression libraries. For matching full words the syntax ^abcd$ should be used, but a developer may easily forget that and just search for abcd, which will match any strings that includes abcd as a part of itself. |
| Comments |
| Comment by David Storch [ 12/Mar/15 ] |
|
My bad, you're right. Even single-term phrase queries form a logical AND with the rest of the query terms. |
| Comment by Henrik Ingo (Inactive) [ 12/Mar/15 ] |
|
Same response: Quoted words are requried/AND search terms. See https://jira.mongodb.org/browse/SERVER-17533?focusedCommentId=849902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-849902 |
| Comment by David Storch [ 11/Mar/15 ] |
|
I believe that this is working as designed. Quoting from the documentation for the $text query operator:
The queries above specify a search for three individual terms (not a phrase), and therefore will act as a logical OR. This is much like a keyword search in a web search engine, where a document matches if any of the keywords are found in the document. The more keywords match, the higher the relevance score. I am going to close as Works as Designed, but please re-open if you have any further questions or concerns. Best, |