Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8898

Scoring algorithm exact match detection not performed correctly

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.0-rc2
    • Component/s: Text Search
    • None
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      There is a minor error in the text search scoring algorithm, which results in slight errors in score on certain searches.

      Documents with fields that only have a single token are given a score boost on that term if the term is an exact string match on the field contents, up to case-sensitivity. Relevant code snippet from FTSSpec::_scoreString:

      --- fts_spec.cpp---
      243 
      244                 // if term is identical to the raw form of the
      245                 // field (untokenized) give it a small boost.
      246                 double adjustment = 1;
      247                 if ( raw.size() == term.length() && raw.equalCaseInsensitive( term ) )
      248                     adjustment += 0.1;
      

      The issue with the above is that term is already stemmed; thus the boost will only be given if term == stem(term).

      Thus, in the example below, the correct behavior should result in {a:"morning"} and {a:"morn"} having the same score for the given search.

      > db.foo.insert({a:"morn"})
      > db.foo.insert({a:"morning"})
      > db.foo.insert({a:"morn!!!"})
      > db.foo.insert({a:"morning!!!"})
      > db.foo.runCommand("text",{search:"this morning",project:{_id:0}}).results
      [
      	{
      		"score" : 1.1,
      		"obj" : {
      			"a" : "morn"
      		}
      	},
      	{
      		"score" : 1,
      		"obj" : {
      			"a" : "morning"
      		}
      	},
      	{
      		"score" : 1,
      		"obj" : {
      			"a" : "morn!!!"
      		}
      	},
      	{
      		"score" : 1,
      		"obj" : {
      			"a" : "morning!!!"
      		}
      	}
      ]
      >
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            rassi J Rassi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: