[SERVER-17809] Case sensitive text queries should match against full tokens instead of stemmed tokens Created: 30/Mar/15  Updated: 12/Oct/15  Resolved: 12/Oct/15

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 3.1.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Mark Benvenuto
Resolution: Won't Fix Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-17437 Case-sensitive mode for text search Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

load('jstests/libs/fts.js'); // for getIDS
 
var t = db.fts_case_sensitive;
t.drop();
 
t.ensureIndex({a: 'text'});
t.insert({_id: 0, a: 'swiftly'});
t.insert({_id: 1, a: 'swiftlY'});  // capital y in the suffix
 
var res = t.find({$text: {$search: 'swiftly', $caseSensitive: true}});
 
// only the first result should be returned for the case sensitive query
assert.eq(getIDS(res), [0]);

Sprint: Platform A (10/09/15)
Participants:

 Description   

$text queries with $caseSensitive: true match against stemmed tokens instead of full tokens:

> var t = db.fts_case_sensitive;
 
> t.insert({_id: 0, a: 'swiftly'});
WriteResult({ "nInserted" : 1 })
 
> t.insert({_id: 1, a: 'swiftlY'});  // capital y in the suffix
WriteResult({ "nInserted" : 1 })
 
> t.find()
{ "_id" : 0, "a" : "swiftly" }
{ "_id" : 1, "a" : "swiftlY" }
 
> t.find({$text: {$search: 'swiftly', $caseSensitive: true}});
{ "_id" : 0, "a" : "swiftly" }
{ "_id" : 1, "a" : "swiftlY" } // this is an unexpected result because of the capital y



 Comments   
Comment by Mark Benvenuto [ 12/Oct/15 ]

Per in-person discussion with jason.rassi, and dan@10gen.com, we plan to abandon this work.

Here is jason.rassi description of the issue.

The current implementation of case-sensitive matching performs an exact match on
the stemmed search token with the stemmed document token. It remains an
unintuitive property of this behavior that a case-sensitive search for "RUNNING"
will match documents containing "RUNNINg" but will not match documents
containing "rUNNING". However, after discussion, we decided that this seems
less harmful than the problems introduced by the spec'd behavior (as implemented
by Mark in this patch), and we were unable to come up with a solution that
provides sane semantics for all cases.

Our feeling is that there aren't very compelling use cases for case-sensitive
matching in combination with "stem-insensitive" matching, and so what we choose
here shouldn't matter all that much. As such, we don't currently plan to follow
up with any other behavior changes in this area.

Generated at Thu Feb 08 03:45:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.