[SERVER-24683] Text search ignores some phrases Created: 21/Jun/16  Updated: 14/Jul/16  Resolved: 21/Jun/16

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.0.9, 3.2.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Hobbs Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-20307 Full word search using exact match ph... Backlog
Operating System: ALL
Steps To Reproduce:

a simple example

db.createCollection("test")
db.test.insert({ "Words": "some random publications"})
db.test.insert({ "Words": "cat publications" })
db.test.insert({ "Words": "i like cats" })
db.test.insert({ "Words": "car publications" })
 
db.test.createIndex({ "Words": "text" })
 
// the word "cat" in this search seems to be ignored
db.test.find( { $text: { $search: "\"cat\" \"publications\"" } } )
 
// this seems to return the same results as above
db.test.find( { $text: { $search: "\"publications\"" } } )
 
// just in case you want to see the text score
db.test.find({ $text: { $search: "\"cat\" \"publications\"" } }, { score: { $meta: "textScore"} })

Participants:

 Description   

When using text search, it seems that some phrases can be ignored. I am using phrase searches to perform a logical AND on several words, for example:

$text: { $search: "\"cat\" \"publications\"" }

...to find results containing BOTH the words "cat" AND "publications". However, we found that with this example the phrase "cat" seems to be ignored and the search seems to just return results matching "publications".

See the repro steps for a simple example to demonstrate the issue.



 Comments   
Comment by David Hobbs [ 21/Jun/16 ]

Thanks for the explanation. I had guessed that it might be something to do with the fact that "cat" is a substring of "publications", but I also knew that the text search is not supposed to match substrings. I have voted for SERVER-20307. Thanks very much for looking into this, it's very helpful for us to know why it's happening.

Comment by Kelsey Schubert [ 21/Jun/16 ]

Hi david.hobbs@prgloo.com

The work required to modify this behavior is tracked in SERVER-20307. Since this issue is manifesting in a slightly different way, I will explain a bit more about what is going on.

The word "cat" is a substring of "publications", so any document that contains the word "publications" is guaranteed to satisfy the phrase matcher's check that the Words field contains the string "cat".

As you have likely observed, searching for "car" and "publications" works as expected

> db.test.find( { $text: { $search: "\"car\" \"publications\"" } } )
{ "_id" : ObjectId("57692c09eec0c4960674304d"), "Words" : "car publications" }

I hope this helps clarify this behavior. Please feel free to vote for SERVER-20307 and watch it for updates.

Kind regards,
Thomas

Generated at Thu Feb 08 04:07:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.