[SERVER-8401] Review Turkish stop word list Created: 30/Jan/13  Updated: 11/Jul/16  Resolved: 26/Feb/13

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.3.2
Fix Version/s: 2.4.0-rc2

Type: Task Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Paul Pedersen
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_turkish.txt



 Comments   
Comment by Paul Pedersen [ 11/Feb/13 ]

I found three Turkish stop word lists: (1) Snowball, (2) http://www.ranks.nl/stopwords/turkish.html, (3) http://nlp.ceng.fatih.edu.tr/blog/?p=101#more-101. Lists (1) == (2), but (3) is more complete (223 v. 114 items). Same situation as Hungarian: Snowball includes an Hungarian stemmer of unknown quality. I argue for the more complete stop word list. Google translate shows a reasonable-looking list, although arguably ordinals "one", "two", "three", etc. should not be considered stop words.

Generated at Thu Feb 08 03:17:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.