Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8873

Bug in decision logic for whether a term in a document is a stopword

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.4.0-rc3, 2.5.0
    • Affects Version/s: 2.4.0-rc1
    • Component/s: Text Search
    • Labels:
      None
    • ALL

      The stopword list generation process does not perform stemming. However, FTSSpec::_scoreString stems words before checking the stopword list:

      --- fts_spec.cpp ---
      215                 makeLower( &term );
      216                 term = tools.stemmer->stem( term );
      217                 if ( tools.stopwords->isStopWord( term ) )
      

      This will result in index entries being generated for any stopword for which stem(stopword) != stopword.

      Note that FTSQuery::_addTerm calls isStopWord before calling stem (so you'll never see a stopword in queryDebugString):

      --- fts_query.cpp ---
       99             string word = tolowerString( term );
      100             if ( sw->isStopWord( word ) )
      101                 return;
      102             word = stemmer.stem( word );
      

      Reproduce with:

      > db.foo.ensureIndex({quote:"text"})
      > db.foo.insert({quote:"any"})
      > db.foo.validate().keysPerIndex
      { "test.foo.$_id_" : 1, "test.foo.$quote_text" : 1 }
      > db.foo.runCommand("text",{search:"any"}).results.length
      0
      > db.foo.runCommand("text",{search:"ani",language:"none"}).results.length
      1
      

      Credit to kay.kim@10gen.com for original repro.

            Assignee:
            rassi J Rassi
            Reporter:
            rassi J Rassi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: