[SERVER-17620] RLP Tokenizer (includes C++ unit tests) Created: 16/Mar/15 Updated: 05/Dec/16 Resolved: 16/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Mark Benvenuto | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Sprint: | Platform 2 04/24/15 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
Implement a derived class of FTSTokenizer that uses the Basis Tech Rosette Linguistics API to tokenize documents. Below is the table of languages we will add. Included in the table below is the official ISO language identifiers from various ISO standards. We will use ISO-639-3 codes for these new languages as ISO-639-1 identifiers are two letters and cannot discriminate between languages in certain language families (ie, Farsi). For Chinese, Simplified, and Traditional are not language dialects, but script dialects so we use a combination of the RLP names (zhs, Simplified Chinese), and the official ISO 15924 name (Hant, note the identifier is title cased in the ISO spec). ISO Definitions:
|
| Comments |
| Comment by Githook User [ 16/Apr/15 ] |
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: |
| Comment by Githook User [ 16/Apr/15 ] |
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: |
| Comment by Githook User [ 16/Apr/15 ] |
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: |