[DOCS-9368] Awkward phrasing in Collations documentation Created: 18/Nov/16 Updated: 30/Oct/23 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | manual, Server |
| Affects Version/s: | None |
| Fix Version/s: | Server_Docs_20231030 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | William Cross | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: | |
| Days since reply: | 1 year, 14 weeks, 2 days ago |
| Epic Link: | DOCSP-1769 |
| Story Points: | 0.25 |
| Description |
|
On the Collation reference page, it says (in two locations): "Limited for specific use case..." (emphasis mine) I think the wording gets the point across, but they're not syntactically correct sentences. Here are the full sentences (plus the label):
This might also be a good place for an example; I'm not sure what is meant by "tie breaker" in this context, or when punctuation affects ordering; maybe it never does affect ordering for English, which is, alas, my only fluent language. (Note: I don't think I made any mistakes in this, but I'm definitely in Muphry's Law territory any time I create a Docs ticket. Also, please let me know if I should fill these out on the public facing Jira project.) |
| Comments |
| Comment by Education Bot [ 31/Oct/22 ] | |
|
Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you! | |
| Comment by David Storch [ 07/Dec/16 ] | |
|
steve.renaker, that is not correct. The default unicode collation element table (DUCET) does include characters that are ignorable at levels 1 through 3 but have a non-zero weight at levels 4 or 5. I believe that these quaternary or level 5 collation elements are not considered variable-weight characters, so their behavior is not affected by maxVariable or alternate. According to Table 13 of UTS #10, this includes control codes and format characters, as well as Hebrew points and Arabic tatweel. For example, if you look at the representation of the DUCET in http://www.unicode.org/Public/UCA/latest/allkeys.txt, you can find the following line:
This means that the tatweel character, U+0640, has primary, secondary, and tertiary weights of zero, which is represented by the zeros between the brackets: [.0000.0000.0000]. The DUCET does not include level 4 or level 5 weights, so I'm actually not sure right now whether U+0640 is a quaternary collation element or a level 5 collation element which only takes affect at strength:5. But hopefully this still helps to clarify. My guess is that the Arabic tatweel as well as control/format characters are only taken into consideration at strength:5. Also note that tailorings of the DUCET could change a character to have level 4 or level 5 weights. For example, some hypothetical language might want to change the combining character for tilde, U+0303, to only take affect at the quaternary level. By default it would be a secondary collation element like other diacritics, but that doesn't mean such a tailoring couldn't exist in theory. | |
| Comment by Steve Renaker (Inactive) [ 06/Dec/16 ] | |
|
david.storch Returning to considerations of the collation strength parameter. Is it correct to say that strength only has an effect on the sort order of strings with punctuation if alternate is set to shifted? I can't find any examples to the contrary, but I may not be using it correctly. | |
| Comment by Steve Renaker (Inactive) [ 28/Nov/16 ] | |
|
Reply received from engineering. Thanks to David for clarifying the use of strength parameter 4. | |
| Comment by Steve Renaker (Inactive) [ 28/Nov/16 ] | |
|
I think we have some incorrect lines in the page this ticket is about. I've emailed engineering to confirm. |