|
There is a minor error in the text search scoring algorithm, which results in slight errors in score on certain searches.
Documents with fields that only have a single token are given a score boost on that term if the term is an exact string match on the field contents, up to case-sensitivity. Relevant code snippet from FTSSpec::_scoreString:
--- fts_spec.cpp---
|
243
|
244 // if term is identical to the raw form of the
|
245 // field (untokenized) give it a small boost.
|
246 double adjustment = 1;
|
247 if ( raw.size() == term.length() && raw.equalCaseInsensitive( term ) )
|
248 adjustment += 0.1;
|
The issue with the above is that term is already stemmed; thus the boost will only be given if term == stem(term).
Thus, in the example below, the correct behavior should result in {a:"morning"} and {a:"morn"} having the same score for the given search.
> db.foo.insert({a:"morn"})
|
> db.foo.insert({a:"morning"})
|
> db.foo.insert({a:"morn!!!"})
|
> db.foo.insert({a:"morning!!!"})
|
> db.foo.runCommand("text",{search:"this morning",project:{_id:0}}).results
|
[
|
{
|
"score" : 1.1,
|
"obj" : {
|
"a" : "morn"
|
}
|
},
|
{
|
"score" : 1,
|
"obj" : {
|
"a" : "morning"
|
}
|
},
|
{
|
"score" : 1,
|
"obj" : {
|
"a" : "morn!!!"
|
}
|
},
|
{
|
"score" : 1,
|
"obj" : {
|
"a" : "morning!!!"
|
}
|
}
|
]
|
>
|
|