[SERVER-30163] Additional terms in phrase search are implicitly ignored. Created: 15/Jul/17  Updated: 27/Dec/23

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.4.6
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Dun Peal Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 1
Labels: mql-semantics, qi-text-search, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22583 Allow text search to OR exact phrases Backlog
is related to DOCS-10382 Clarify if MongoDB text search is cor... Closed
Assigned Teams:
Query Integration
Operating System: ALL
Steps To Reproduce:

db.stores.insert(
   [
     { _id: 1, name: "Java Hut", description: "Coffee and cakes" },
     { _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
     { _id: 3, name: "Coffee Shop", description: "Just coffee" },
     { _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
     { _id: 5, name: "Java Shopping", description: "Indonesian goods" }
   ]
)

Then run the following text search:

db.stores.find( { $text: { $search: "java \"coffee shop\"" } } )

Only the one document containing the phrase "coffee shop" is returned. Documents containing the word "java" are not returned, and in fact having the term "java" in the search has no effect whatsoever on the results returned.

Participants:

 Description   

As currently specified, the logic for phrase matching implicitly ignores any additional terms in the $search string.

For example, the $search string:

"\"ssl certificate\" authority key"

Shall be compiled to the following search:

"ssl certificate" and ("authority" or "key" or "ssl" or "certificate" )

As you can see in the compiled search, all terms besides the phrase "ssl certificate" will be ignored. The search will match all strings that contain the phrase, and none other. In particular, strings that contain any of these additional terms - "authority", "key", or both - will not match unless they also contain the phrase.

There are two problems with this behavior:

  1. It is surprising, counter-intuitive, and inconsistent with the behavior of regular text searches that do not contain phrases. As a user, I would not expect additional search terms to simply be ignored, certainly in this implicit manner, without any warnings. This is especially surprising given how normal matching works by creating an "OR" relation between the various terms.
  2. It is less powerful than it could be. Specifying additional terms should allow the user to make meaningful refinements to their search.

Therefore, I propose that the $search strings with a phrase will perform a logical "OR" between the phrase and any of the other phrases or terms in the string.

So, for example, the string above will compile to:

"ssl certificate" OR "authority" OR "key"

This behavior is consistent with the behavior of regular text searches (that do not contain phrases), and provides additional functionality when the user adds additional terms, instead of implicitly ignoring them as is done currently.

Finally, this behavior conforms to parts of the manual that are currently wrong; see for example DOCS-10382.



 Comments   
Comment by Githook User [ 22/Jan/19 ]

Author:

{'email': 'stennie@cpan.org', 'name': 'Stephen Steneker', 'username': 'stennie'}

Message: Clarify current behaviour as per SERVER-30163 (Additional terms in phrase search are implicitly ignored)
Branch: v3.2
https://github.com/mongodb/docs/commit/312d6a598dc7a1504f963a7502eb2a3dfedcdc97

Comment by Githook User [ 22/Jan/19 ]

Author:

{'username': 'stennie', 'email': 'stennie@cpan.org', 'name': 'Stephen Steneker'}

Message: Clarify current behaviour as per SERVER-30163 (Additional terms in phrase search are implicitly ignored)
Branch: v3.4
https://github.com/mongodb/docs/commit/a97ab693e777e2e5af00edb8594ec8491aa551b1

Comment by Githook User [ 22/Jan/19 ]

Author:

{'email': 'stennie@cpan.org', 'name': 'Stephen Steneker', 'username': 'stennie'}

Message: Clarify current behaviour as per SERVER-30163 (Additional terms in phrase search are implicitly ignored)
Branch: v3.6
https://github.com/mongodb/docs/commit/eb9974b3c2bb4a6fa7bbfc61f6dd9e42fc9ade7d

Comment by Githook User [ 22/Jan/19 ]

Author:

{'email': 'stennie@cpan.org', 'name': 'Stephen Steneker', 'username': 'stennie'}

Message: Clarify current behaviour as per SERVER-30163 (Additional terms in phrase search are implicitly ignored)
Branch: v4.0
https://github.com/mongodb/docs/commit/111afca2f1fab9a7773182f2ff221be444efcf69

Comment by Githook User [ 22/Jan/19 ]

Author:

{'username': 'stennie', 'email': 'stennie@cpan.org', 'name': 'Stephen Steneker'}

Message: Clarify current behaviour as per SERVER-30163 (Additional terms in phrase search are implicitly ignored)
Branch: master
https://github.com/mongodb/docs/commit/68e7e46a3095384ba533204a9db1edc61b7ee048

Comment by Ian Whalen (Inactive) [ 21/Jul/17 ]

Hi Dun, we definitely recognize that the current semantics are less than ideal, but we're not focusing on Text Search improvements for the current release. We will definitely revisit this as we start planning for future releases.

Comment by David Storch [ 17/Jul/17 ]

Hi dunpeal,

Thanks for raising this issue. We've had some discussion internally around this, sparked by DOCS-10382. You are suggesting a breaking change to the query language, which we do not take lightly. However, I do agree that the current text search semantics are weak, so it is quite possibly a breaking change worth making. I'm sending this ticket to the Query Team for further consideration.

Best,
Dave

Generated at Thu Feb 08 04:22:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.