[SERVER-34365] MMAPv1 and HFS+ canonicalization results in crashes Created: 06/Apr/18  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: cwf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Storage Execution
Operating System: OS X
Participants:
Linked BF Score: 123

 Description   

If you create a database with a decomposable character (such as ά) on OS X, MMAPv1 will create a directory with a decomposed canonicalized name. This causes the storage system to think the name of the database is one thing (the decomposed version), while other parts of the system (such as the DatabaseHolder and the UUID catalog) think it's the composed version provided. This results in an invariant when iterating through the databases known to storage and expecting to find them in the DatabaseHolder. It also results in a segfault on startup.



 Comments   
Comment by Matthew Russotto [ 06/Apr/18 ]

Unix-based file systems just take the UTF-8 they're given and treat it as a sequence of (non-null) bytes. HFS+ treats its file names as Unicode, and normalizes whatever its given to Normalized Form Canonical Decomposition (NFD). So if you create a file with U+03AC (GREEK SMALL LETTER ALPHA WITH TONOS), it decomposes to U+03B1 GREEK SMALL LETTER ALPHA, U+0301 COMBINING ACUTE ACCENT. These have different UTF-8 representations – 0xCE 0xAC for the composition, 0xCE 0XB1 0XCC 0X81 for the decomposition.

Comment by Eric Milkie [ 06/Apr/18 ]

Why is this not a problem for Unix-based filesystems? Do they encode special characters in a different way?

Generated at Thu Feb 08 04:36:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.