[SERVER-20307] Full word search using exact match phrase does not account for word boundaries Created: 07/Sep/15  Updated: 28/Dec/23

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Shae Archibald Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 16
Labels: qi-text-search, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-24683 Text search ignores some phrases Closed
Assigned Teams:
Query Integration
Participants:

 Description   

According to MongoDB docs: "if a document field contains the word blueberry, a search on the term blue will not match the document"

Text searches including an exact match phrase containing more than 1 word do not obey the above statement.

Related Stackoverflow question can be found here:
http://stackoverflow.com/questions/31659196/mongodb-full-word-search-with-exact-phrase-not-returning-expected-results/31951843#31951843



 Comments   
Comment by David Storch [ 02/Aug/19 ]

This behavior still exists as of version 4.2:

MongoDB Enterprise > db.test.insert({ "t" : "Women's Fashion" })
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.test.ensureIndex({ "t" : "text" })
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"commitQuorum" : 1,
	"ok" : 1
}
MongoDB Enterprise > db.test.find({ "$text" : { "$search" : "\"Men's\"" } }, { "_id" : 0 })
{ "t" : "Men's Fashion" }
MongoDB Enterprise > db.test.find({ "$text" : { "$search" : "\"Men's Fashion\"" } }, { "_id" : 0 })
{ "t" : "Women's Fashion" }
{ "t" : "Men's Fashion" }

Comment by Daniel Pasette (Inactive) [ 07/Sep/15 ]

This behavior is explained by the behavior of the phrase matcher, which does a simple case-insensitive string match using the search phrase on the indexed field.

See the phrase matching code here.

To fix, the phrase matching logic would have to take word boundaries into account.

Generated at Thu Feb 08 03:53:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.