[SERVER-8401] Review Turkish stop word list Created: 30/Jan/13 Updated: 11/Jul/16 Resolved: 26/Feb/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 2.3.2 |
| Fix Version/s: | 2.4.0-rc2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Pasette (Inactive) | Assignee: | Paul Pedersen |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_turkish.txt |
| Comments |
| Comment by Paul Pedersen [ 11/Feb/13 ] |
|
I found three Turkish stop word lists: (1) Snowball, (2) http://www.ranks.nl/stopwords/turkish.html, (3) http://nlp.ceng.fatih.edu.tr/blog/?p=101#more-101. Lists (1) == (2), but (3) is more complete (223 v. 114 items). Same situation as Hungarian: Snowball includes an Hungarian stemmer of unknown quality. I argue for the more complete stop word list. Google translate shows a reasonable-looking list, although arguably ordinals "one", "two", "three", etc. should not be considered stop words. |