[SERVER-8873] Bug in decision logic for whether a term in a document is a stopword Created: 06/Mar/13  Updated: 11/Jul/16  Resolved: 12/Mar/13

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.4.0-rc1
Fix Version/s: 2.4.0-rc3, 2.5.0

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: J Rassi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-8873_test.diff    
Operating System: ALL
Participants:

 Description   

The stopword list generation process does not perform stemming. However, FTSSpec::_scoreString stems words before checking the stopword list:

--- fts_spec.cpp ---
215                 makeLower( &term );
216                 term = tools.stemmer->stem( term );
217                 if ( tools.stopwords->isStopWord( term ) )

This will result in index entries being generated for any stopword for which stem(stopword) != stopword.

Note that FTSQuery::_addTerm calls isStopWord before calling stem (so you'll never see a stopword in queryDebugString):

--- fts_query.cpp ---
 99             string word = tolowerString( term );
100             if ( sw->isStopWord( word ) )
101                 return;
102             word = stemmer.stem( word );

Reproduce with:

> db.foo.ensureIndex({quote:"text"})
> db.foo.insert({quote:"any"})
> db.foo.validate().keysPerIndex
{ "test.foo.$_id_" : 1, "test.foo.$quote_text" : 1 }
> db.foo.runCommand("text",{search:"any"}).results.length
0
> db.foo.runCommand("text",{search:"ani",language:"none"}).results.length
1

Credit to kay.kim@10gen.com for original repro.



 Comments   
Comment by auto [ 12/Mar/13 ]

Author:

{u'date': u'2013-03-12T14:30:21Z', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-8873 Correctly decide if a term in a text field is a stopword
Branch: v2.4
https://github.com/mongodb/mongo/commit/edb5f4efef0b5592f6e9449db86eca50223439ea

Comment by auto [ 12/Mar/13 ]

Author:

{u'date': u'2013-03-12T14:30:21Z', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-8873 Correctly decide if a term in a text field is a stopword
Branch: master
https://github.com/mongodb/mongo/commit/d4193ba12b50954ede6bab1594696769893dfb13

Comment by J Rassi [ 06/Mar/13 ]

Failing unit test attached.

Generated at Thu Feb 08 03:18:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.