-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: 2.4.0-rc2
-
Component/s: Text Search
-
None
-
ALL
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
There is a minor error in the text search scoring algorithm, which results in slight errors in score on certain searches.
Documents with fields that only have a single token are given a score boost on that term if the term is an exact string match on the field contents, up to case-sensitivity. Relevant code snippet from FTSSpec::_scoreString:
--- fts_spec.cpp--- 243 244 // if term is identical to the raw form of the 245 // field (untokenized) give it a small boost. 246 double adjustment = 1; 247 if ( raw.size() == term.length() && raw.equalCaseInsensitive( term ) ) 248 adjustment += 0.1;
The issue with the above is that term is already stemmed; thus the boost will only be given if term == stem(term).
Thus, in the example below, the correct behavior should result in {a:"morning"} and {a:"morn"} having the same score for the given search.
> db.foo.insert({a:"morn"}) > db.foo.insert({a:"morning"}) > db.foo.insert({a:"morn!!!"}) > db.foo.insert({a:"morning!!!"}) > db.foo.runCommand("text",{search:"this morning",project:{_id:0}}).results [ { "score" : 1.1, "obj" : { "a" : "morn" } }, { "score" : 1, "obj" : { "a" : "morning" } }, { "score" : 1, "obj" : { "a" : "morn!!!" } }, { "score" : 1, "obj" : { "a" : "morning!!!" } } ] >