-
Type: Investigation
-
Resolution: Unresolved
-
Priority: Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: bsondump
-
Environment:Windows
-
Platforms 2017-01-23
-
1,920
When bsondump is displaying UTF-8 strings in json mode (the default), it just sends the UTF-8 text to the display as-is (except for some escaping of certain characters). In Windows, 8-bit characters sent to the screen are displayed according to the current code page. In American installs of Windows, this is likely to be code page 437, the DOS code page (remember DOS?) and so you get line drawing characters and some Greek instead of the intended characters when non-ASCII Unicode characters are in the string. In Europe, the code page is likely to be code page 1252 ("Windows ANSI") which is pretty much ISO Latin-1 so the UTF-8 component bytes will show up as assorted European characters – not the intended ones.
This turns out to be fairly easy to fix by switching the console's code page to CP_UTF8 before entering the display code and restoring it to whatever it was before exiting. In the UTF-8 code page, we can treat the console as if it was Linux and it will work for any characters supported by the font.