[SERVER-34365] MMAPv1 and HFS+ canonicalization results in crashes Created: 06/Apr/18 Updated: 06/Dec/22 Resolved: 14/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MMAPv1, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | cwf | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Storage Execution
|
||||
| Operating System: | OS X | ||||
| Participants: | |||||
| Linked BF Score: | 123 | ||||
| Description |
|
If you create a database with a decomposable character (such as ά) on OS X, MMAPv1 will create a directory with a decomposed canonicalized name. This causes the storage system to think the name of the database is one thing (the decomposed version), while other parts of the system (such as the DatabaseHolder and the UUID catalog) think it's the composed version provided. This results in an invariant when iterating through the databases known to storage and expecting to find them in the DatabaseHolder. It also results in a segfault on startup. |
| Comments |
| Comment by Matthew Russotto [ 06/Apr/18 ] |
|
Unix-based file systems just take the UTF-8 they're given and treat it as a sequence of (non-null) bytes. HFS+ treats its file names as Unicode, and normalizes whatever its given to Normalized Form Canonical Decomposition (NFD). So if you create a file with U+03AC (GREEK SMALL LETTER ALPHA WITH TONOS), it decomposes to U+03B1 GREEK SMALL LETTER ALPHA, U+0301 COMBINING ACUTE ACCENT. These have different UTF-8 representations – 0xCE 0xAC for the composition, 0xCE 0XB1 0XCC 0X81 for the decomposition. |
| Comment by Eric Milkie [ 06/Apr/18 ] |
|
Why is this not a problem for Unix-based filesystems? Do they encode special characters in a different way? |