[DOCS-9368] Awkward phrasing in Collations documentation Created: 18/Nov/16  Updated: 30/Oct/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Minor - P4
Reporter: William Cross Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 1 year, 14 weeks, 2 days ago
Epic Link: DOCSP-1769
Story Points: 0.25

 Description   

On the Collation reference page, it says (in two locations):

"Limited for specific use case..." (emphasis mine)

I think the wording gets the point across, but they're not syntactically correct sentences.

Here are the full sentences (plus the label):

  • "Quaternary Level. Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text."
    • Perhaps the intention was something like, "Limited to the specific use case where it's necessary to consider punctuation, but levels 1-3 ignore punctuation for the language. Also used for processing Japanese text."
  • "Identical Level. Limited for specific use case of tie breaker."
    • Perhaps the intention was something like, "Identical to level 4 in most respects. The only difference in ordering occurs in the specific use case where a tie breaker is required."

This might also be a good place for an example; I'm not sure what is meant by "tie breaker" in this context, or when punctuation affects ordering; maybe it never does affect ordering for English, which is, alas, my only fluent language.

(Note: I don't think I made any mistakes in this, but I'm definitely in Muphry's Law territory any time I create a Docs ticket. Also, please let me know if I should fill these out on the public facing Jira project.)



 Comments   
Comment by Education Bot [ 31/Oct/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Comment by David Storch [ 07/Dec/16 ]

steve.renaker, that is not correct. The default unicode collation element table (DUCET) does include characters that are ignorable at levels 1 through 3 but have a non-zero weight at levels 4 or 5. I believe that these quaternary or level 5 collation elements are not considered variable-weight characters, so their behavior is not affected by maxVariable or alternate. According to Table 13 of UTS #10, this includes control codes and format characters, as well as Hebrew points and Arabic tatweel. For example, if you look at the representation of the DUCET in http://www.unicode.org/Public/UCA/latest/allkeys.txt, you can find the following line:

0640  ; [.0000.0000.0000] # ARABIC TATWEEL

This means that the tatweel character, U+0640, has primary, secondary, and tertiary weights of zero, which is represented by the zeros between the brackets: [.0000.0000.0000]. The DUCET does not include level 4 or level 5 weights, so I'm actually not sure right now whether U+0640 is a quaternary collation element or a level 5 collation element which only takes affect at strength:5. But hopefully this still helps to clarify. My guess is that the Arabic tatweel as well as control/format characters are only taken into consideration at strength:5.

Also note that tailorings of the DUCET could change a character to have level 4 or level 5 weights. For example, some hypothetical language might want to change the combining character for tilde, U+0303, to only take affect at the quaternary level. By default it would be a secondary collation element like other diacritics, but that doesn't mean such a tailoring couldn't exist in theory.

Comment by Steve Renaker (Inactive) [ 06/Dec/16 ]

david.storch Returning to considerations of the collation strength parameter. Is it correct to say that strength only has an effect on the sort order of strings with punctuation if alternate is set to shifted? I can't find any examples to the contrary, but I may not be using it correctly.

Comment by Steve Renaker (Inactive) [ 28/Nov/16 ]

Reply received from engineering. Thanks to David for clarifying the use of strength parameter 4.

Comment by Steve Renaker (Inactive) [ 28/Nov/16 ]

I think we have some incorrect lines in the page this ticket is about. I've emailed engineering to confirm.

Generated at Thu Feb 08 07:58:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.