[SERVER-8873] Bug in decision logic for whether a term in a document is a stopword Created: 06/Mar/13 Updated: 11/Jul/16 Resolved: 12/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 2.4.0-rc1 |
| Fix Version/s: | 2.4.0-rc3, 2.5.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | J Rassi | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The stopword list generation process does not perform stemming. However, FTSSpec::_scoreString stems words before checking the stopword list:
This will result in index entries being generated for any stopword for which stem(stopword) != stopword. Note that FTSQuery::_addTerm calls isStopWord before calling stem (so you'll never see a stopword in queryDebugString):
Reproduce with:
Credit to kay.kim@10gen.com for original repro. |
| Comments |
| Comment by auto [ 12/Mar/13 ] |
|
Author: {u'date': u'2013-03-12T14:30:21Z', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: |
| Comment by auto [ 12/Mar/13 ] |
|
Author: {u'date': u'2013-03-12T14:30:21Z', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: |
| Comment by J Rassi [ 06/Mar/13 ] |
|
Failing unit test attached. |