Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8873

Bug in decision logic for whether a term in a document is a stopword

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • 2.4.0-rc3, 2.5.0
    • 2.4.0-rc1
    • Text Search
    • None
    • ALL

    Description

      The stopword list generation process does not perform stemming. However, FTSSpec::_scoreString stems words before checking the stopword list:

      --- fts_spec.cpp ---
      215                 makeLower( &term );
      216                 term = tools.stemmer->stem( term );
      217                 if ( tools.stopwords->isStopWord( term ) )

      This will result in index entries being generated for any stopword for which stem(stopword) != stopword.

      Note that FTSQuery::_addTerm calls isStopWord before calling stem (so you'll never see a stopword in queryDebugString):

      --- fts_query.cpp ---
       99             string word = tolowerString( term );
      100             if ( sw->isStopWord( word ) )
      101                 return;
      102             word = stemmer.stem( word );

      Reproduce with:

      > db.foo.ensureIndex({quote:"text"})
      > db.foo.insert({quote:"any"})
      > db.foo.validate().keysPerIndex
      { "test.foo.$_id_" : 1, "test.foo.$quote_text" : 1 }
      > db.foo.runCommand("text",{search:"any"}).results.length
      0
      > db.foo.runCommand("text",{search:"ani",language:"none"}).results.length
      1

      Credit to kay.kim@10gen.com for original repro.

      Attachments

        Activity

          People

            rassi J Rassi
            rassi J Rassi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: