[SERVER-5099] Non-ASCII text on the command line isn't handled well in Windows Created: 26/Feb/12  Updated: 11/Jul/16  Resolved: 16/Mar/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 2.5.0

Type: Bug Priority: Major - P3
Reporter: Tad Marshall Assignee: Unassigned
Resolution: Done Votes: 1
Labels: Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows command line with text that isn't completely US-ASCII


Issue Links:
Depends
is depended on by SERVER-5333 Issues with non-ASCII characters in f... Closed
Related
related to SERVER-7496 Mongo.exe client crashes when usernam... Closed
Operating System: Windows
Participants:

 Description   

Any text characters above 0x7F entered on the command line for mongod.exe, mongos.exe, mongo.exe and the other programs in the suite are not necessarily being handled correctly in Windows. Although we build the Windows versions with UNICODE and _UNICODE defined, the entry point we declare is main() and this gets us text in the 8-bit code page of the invoking command window. We would need to change the entry point to wmain() to get a wide-character UTF-16 string, and this would then require using a wide version of boost::program_options to parse the 16-bit characters. The misbehavior that is seen will depend on the code page of the invoking command window. In US English versions of Windows, you get the DOS-compatible code page 437 if you haven't changed your configuration. In Western European versions of Windows you may get code page 1252 which is the same as ISO Latin 1 and so the same as Unicode for characters up to 0xFF. Beyond these issues, there may be instances where data isn't handled correctly: I found and am fixing a few I found in the Windows Service code. We were getting sign-extension of characters between 0x80 and 0xFF, which turned 0xE1 ("LATIN SMALL LETTER A WITH ACUTE", 'á') into U+FFE1 (displays as "FULLWIDTH POUND SIGN", '£').

This may not be an issue for some users (US-only, or European/UK users using code page 1252) but the issue is likely to pop up repeatedly until we make the code fully Unicode-capable.



 Comments   
Comment by auto [ 16/Mar/13 ]

Author:

{u'date': u'2013-03-16T16:51:04Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Convert Unicode text on Windows command line for all tools

For the Windows version of mongobridge, bsondump, docgenerator, mongodump,
mongoexport, mongofiles, mongoimport, loadgenerator, mongooplog,
mongorestore, mongosniff, mongostat, and mongotop, switch to a Unicode
"wmain()" entry point and convert Unicode characters on the command line
to UTF-8. Move previous main() to a new toolMain() routine so it can be
called from both main() and wmain(). Clean up headers.
Branch: master
https://github.com/mongodb/mongo/commit/122657d540114a0d31e07da6fec50f7a1b35936a

Comment by auto [ 15/Mar/13 ]

Author:

{u'date': u'2013-03-15T06:55:57Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Convert Unicode text on Windows command line for test.exe

For the Windows version of test, switch to a Unicode "wmain()" entry
point and convert Unicode characters on the command line to UTF-8. Move
previous main() to a new dbtestsMain() routine so it can be called from
both main() and wmain().
Branch: master
https://github.com/mongodb/mongo/commit/dfb307e5716ad789cc2146048853d3a45077056b

Comment by auto [ 15/Mar/13 ]

Author:

{u'date': u'2013-03-14T20:58:56Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Convert Unicode text on Windows command line for mongos.exe

For the Windows version of mongos, switch to a Unicode "wmain()" entry
point and convert Unicode characters on the command line to UTF-8. Move
previous main() to a new mongoSMain() routine so it can be called from
both main() and wmain(). Clean up header list.
Branch: master
https://github.com/mongodb/mongo/commit/bb9594f5ab9400d0c72b400a9f7f4e598526cece

Comment by auto [ 14/Mar/13 ]

Author:

{u'date': u'2013-03-13T19:13:04Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Convert Unicode text on Windows command line for mongod.exe

For the Windows version of mongod, switch to a Unicode "wmain()" entry
point and convert Unicode characters on the command line to UTF-8.
Branch: master
https://github.com/mongodb/mongo/commit/2b2b5aa8f61c48719331ce3c2c3568d250964b46

Comment by auto [ 13/Mar/13 ]

Author:

{u'date': u'2013-03-13T16:52:49Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Fix 32-bit signed/unsigned warnings
Branch: master
https://github.com/mongodb/mongo/commit/fd5c1b3925f09937855e8a19130853281acecabb

Comment by auto [ 13/Mar/13 ]

Author:

{u'date': u'2013-03-13T14:21:56Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5099 Convert UTF-16 environment to UTF-8 in WindowsCommandLine

Implement support for the third parameter (environment) of main()/wmain()
and convert the Unicode environment strings to UTF-8. Use the
environment parameter in mongo.exe.
Branch: master
https://github.com/mongodb/mongo/commit/f5352a787b042ba663dd5cc9976088f719f5651f

Comment by Tad Marshall [ 21/Mar/12 ]

I have made this change to mongo.exe as part of making UTF-8 work right in the shell ( SERVER-2939 ) and it's pretty easy to convert the UTF-16 test from wmain() to a set of UTF-8 argv[] components and then let the normal UTF-8 Linux functionality use it. I'll put the code in util/text.cpp next to the utf8/utf16 Windows code that is already there.

Comment by Tad Marshall [ 26/Feb/12 ]

A small simplification to what I described in the problem statement would be to change the Windows executables to start at wmain() to get the wide-character Unicode text from Windows, and then convert it to UTF-8 before processing it. We could do the whole command line and then parse it into argc and argv, or we could convert the argv components one at a time. This would let us stay with the 8-bit-character version of boost::program_options and make the processing code more similar between Windows and non-Windows versions. Since we want Unicode to work correctly on Windows, we would then just need to translate from UTF-8 into Windows-style UTF-16 wide characters before using the text in any Windows API. I don't know about boost::file_operation (what happens when you pass a UTF-8 encoded string to the Windows version) but it might "just work" ... we'll need a bunch of manual testing to see what we can get away with.

Generated at Thu Feb 08 03:07:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.