[SERVER-26891] score for text search Created: 03/Nov/16 Updated: 27/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Filip | Assignee: | Backlog - Query Integration |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | qi-text-search | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Integration
|
| Participants: |
| Description |
|
Calculated score right now is just pointless, because it's not consistent. It should be like in every other full text search engine - percentage representation of accuracy between 0 and 1. Right now I can use textScore only for sorting, if I want to filter results with more than 60% accuracy I can't do that right now. Right now text search score is just confising, even more, when you have an array. It should be done like in every other full text search engine. |
| Comments |
| Comment by Filip [ 15/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks! If you guys need any further explanation I'm willing to help. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 08/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the detailed response aPoCoMiLogin. I'm sending this ticket to the Query team for evaluation. Regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Filip [ 07/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To solve this issue, the score can be easily divided by it's max score for that query. For example when collection books have text index with language none then the score is more predictable. For example:
then max score for 4 words record would be 2.5; because score is
so if my query matches 3 words of 4 possible, then score would be 2 because:
so then to get percentage accuracy it's easy as divide score by max score:
So in some cases I can deconstruct that score, but i'm limited to non language search with few results which I can filter outside of database. Would be super useful to have score as described above (simple score divided by max score). This is not hard I believe to implement, but score would be finally reusable and more predictable for language queries. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Filip [ 03/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, for example: create text index in books collection:
then lets insert some documents
then lets search for something:
the result will be something like that:
and now the problem. I want match only best results with like 60% of accuracy, so let's say that score 1.0 is our 100%, but what happens when I'll change query:
and the result will be:
and the score changes, but still I cant say how accurate the results are. Now the score is useful only for sorting, would be superb if I could know how accurate that results are, or even filter them to fit my needs. In short TF-IDF: https://en.wikipedia.org/wiki/Tf%E2%80%93idf is inplemented in mongo in strange way which is hard to deconstruct to reuse that score. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darek [ 03/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I entirely agree that scoring system should be more friendly. It would be great to see as a result <0, 1> score and not as now like score according to number of many variables, documents and so forth. I would like to show results above some values but now I don't know how to do this because score can be above 1 and other time above e.g 100. For example:
should return results as:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 03/Nov/16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please provide an example of how the current implementation doesn't address your needs, as well as an example of the functionality you'd like to see in a future version of MongoDB? Thanks, |