[SERVER-8898] Scoring algorithm exact match detection not performed correctly Created: 07/Mar/13  Updated: 10/Dec/14  Resolved: 14/Dec/13

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.4.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

There is a minor error in the text search scoring algorithm, which results in slight errors in score on certain searches.

Documents with fields that only have a single token are given a score boost on that term if the term is an exact string match on the field contents, up to case-sensitivity. Relevant code snippet from FTSSpec::_scoreString:

--- fts_spec.cpp---
243 
244                 // if term is identical to the raw form of the
245                 // field (untokenized) give it a small boost.
246                 double adjustment = 1;
247                 if ( raw.size() == term.length() && raw.equalCaseInsensitive( term ) )
248                     adjustment += 0.1;

The issue with the above is that term is already stemmed; thus the boost will only be given if term == stem(term).

Thus, in the example below, the correct behavior should result in {a:"morning"} and {a:"morn"} having the same score for the given search.

> db.foo.insert({a:"morn"})
> db.foo.insert({a:"morning"})
> db.foo.insert({a:"morn!!!"})
> db.foo.insert({a:"morning!!!"})
> db.foo.runCommand("text",{search:"this morning",project:{_id:0}}).results
[
	{
		"score" : 1.1,
		"obj" : {
			"a" : "morn"
		}
	},
	{
		"score" : 1,
		"obj" : {
			"a" : "morning"
		}
	},
	{
		"score" : 1,
		"obj" : {
			"a" : "morn!!!"
		}
	},
	{
		"score" : 1,
		"obj" : {
			"a" : "morning!!!"
		}
	}
]
>



 Comments   
Comment by Eliot Horowitz (Inactive) [ 07/Mar/13 ]

This is intended behavior.
Idea is exact matches including stems is relevant.

Generated at Thu Feb 08 03:18:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.