-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.0-rc1
-
Component/s: Text Search
-
None
-
ALL
The stopword list generation process does not perform stemming. However, FTSSpec::_scoreString stems words before checking the stopword list:
--- fts_spec.cpp ---
215 makeLower( &term );
216 term = tools.stemmer->stem( term );
217 if ( tools.stopwords->isStopWord( term ) )
This will result in index entries being generated for any stopword for which stem(stopword) != stopword.
Note that FTSQuery::_addTerm calls isStopWord before calling stem (so you'll never see a stopword in queryDebugString):
--- fts_query.cpp --- 99 string word = tolowerString( term ); 100 if ( sw->isStopWord( word ) ) 101 return; 102 word = stemmer.stem( word );
Reproduce with:
> db.foo.ensureIndex({quote:"text"}) > db.foo.insert({quote:"any"}) > db.foo.validate().keysPerIndex { "test.foo.$_id_" : 1, "test.foo.$quote_text" : 1 } > db.foo.runCommand("text",{search:"any"}).results.length 0 > db.foo.runCommand("text",{search:"ani",language:"none"}).results.length 1
Credit to kay.kim@10gen.com for original repro.