Priority: Major - P3
Affects Version/s: None
Fix Version/s: 3.3.14
Component/s: Internal Code
Sprint:Platforms 2016-08-26, Platforms 2016-09-19
The use of UTF-8 unicode characters in a database name will cause creation of directories with directoryperdb to fail.
Because the BSON spec defines strings to be stored in UTF-8, strings in the
server are also UTF-8. Windows, however, uses UTF-16 for its implementation of
unicode, and as inputs for its APIs. This means that we must convert between our internally used 8 bit characters
and Windows 16 bit characters before API calls are made. For file operations, we do this in two ways. mongo::File is the first.
When open is called on a path, MultiByteToWideChar is called on the path, converting the UTF-8 encoded string to UTF-16.
The second is through boost::filesystem::path. This class uses C++'s locale system. std::locale is an object which specifies
different properties which a localization might have. These properties are called facets. One such facet is the codecvt, which handles
conversion between different types of strings. The boost::filesystem::path instantiates a copy of the global std::locale, and overrides its
codecvt with a custom converter object. This locale is then saved globally for use in path operations. When a path is created, or
appended to, the codecvt is used, if necessary, to convert the provided string into the operating system's default character format.
The original std::locale is left as is. Unfortunately, boost::filesystem's implementation of the codecvt, windows_file_codecvt, is incomplete.
It will set the 8 bit character's code page to either ANSII, or the OS's OEM codepage. This means the conversion will be invalid.
Because two mechanisms are used, it appears that we are creating an incorrect directory name, using boost::filesystem::path, creating that incorrect directory, then attempting to create a file in the correct path. The directory in the file path will not exist, and file creation will fail.
FileAllocator's makeTempFileName and run functions will need to be modified. makeTempFileName produces a path as a string. Though it uses boost::filesystem::path internally, it translates the path back into 8 bit characters when it converts to std::string. run then uses c_str on said std::string without any width conversion.
A plausible solution to this might be to use boost's locale library to generate a new std::locale object with a correct codecvt, as per the
boost filesystem documentation here: http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/default_encoding_under_windows.html