-
Type:
Task
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Text Search
-
None
-
Fully Compatible
-
Platform 2 04/24/15
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
Implement a derived class of FTSTokenizer that uses the Basis Tech Rosette Linguistics API to tokenize documents.
Below is the table of languages we will add. Included in the table below is the official ISO language identifiers from various ISO standards. We will use ISO-639-3 codes for these new languages as ISO-639-1 identifiers are two letters and cannot discriminate between languages in certain language families (ie, Farsi).
For Chinese, Simplified, and Traditional are not language dialects, but script dialects so we use a combination of the RLP names (zhs, Simplified Chinese), and the official ISO 15924 name (Hant, note the identifier is title cased in the ISO spec).
ISO Definitions:
- ISO-639-1 - Two Letter Codes - Codes for the representation of names of languages
- ISO-639-3 - Three Letter Codes - Codes for the representation of names of languages
- ISO 15924 - Codes for the representation of names of scripts
Language | ISO-639-1 | ISO-639-3 | RLP | MongoDB | RLP Language Code |
---|---|---|---|---|---|
Arabic | ar | ara | ara | ara,arabic | BT_LANGUAGE_ARABIC |
Dari | fa | prs | prs | prs,dari | BT_LANGUAGE_DARI |
Farsi (Persian) | fa | pes | pes | pes,iranian persian | BT_LANGUAGE_WESTERN_FARSI |
Urdu | ur | urd | urd | urd,urdu | BT_LANGUAGE_URDU |
Simplified Chinese | N/A | N/A | zhs | zhs,hans,simplified chinese | BT_LANGUAGE_SIMPLIFIED_CHINESE |
Traditional Chinese | N/A | N/A | zht | zht,hant,traditional chinese | BT_LANGUAGE_TRADITIONAL_CHINESE |
- is depended on by
-
SERVER-13709 Add text index support for arabic
-
- Closed
-
-
SERVER-17595 Add support for Persian language in text search
-
- Closed
-
-
SERVER-8962 increasing Chinese support of text index
-
- Closed
-
- is duplicated by
-
SERVER-13709 Add text index support for arabic
-
- Closed
-
-
SERVER-17595 Add support for Persian language in text search
-
- Closed
-