[SERVER-8495] BinData constructor for V8 stores binary data as UTF-8, mangling it Created: 10/Feb/13 Updated: 11/Jul/16 Resolved: 11/Feb/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | 2.4.0-rc0 |
| Fix Version/s: | 2.4.0-rc1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Simon Green | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 0 |
| Labels: | shell | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows |
||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | Execute shell script with 2.2 running: dev:PRIMARY> var o = db.Order.findOne() dev:PRIMARY> CSUUID matches the GUID that C# and MongoVue see. Starting 2.4-rc0 (with the same data files) produces: dev:PRIMARY> var o = db.Order.findOne() NOTE: Different CSUUID and inability to find record with the _id of the record just read. Have tried repair / rebuild index. |
| Participants: |
| Description |
|
Trying to iterate a collection in the shell and perform updates on each item fails. Turns out that Mongo cannot find the item it has just read (by its _id). This is using CSGUIDs and it appears that the byte order of the BinData read is being changed within the shell. The application and other tools can still read and report the values correctly so it is most likely a problem with the shell rather than the server itself. |
| Comments |
| Comment by auto [ 11/Feb/13 ] |
|
Author: {u'date': u'2013-02-11T16:05:08Z', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}Message: Remove undocumented overload of the BinData constructor, and its |
| Comment by Simon Green [ 10/Feb/13 ] |
|
Thanks for the explanation Tad! I guess that's why a few of the records did manage to match (they just happened to have all-legal bytes). |
| Comment by Tad Marshall [ 10/Feb/13 ] |
|
Hi Simon, Thank you for this bug report, and for the detailed steps-to-reproduce. I was able to reproduce your problem exactly. Our interface code for the V8 JavaScript engine is storing binary data (BinData) within V8 as a UTF-8 string, which is a poor choice due to validation of UTF-8. The actual binary data in your example is:
By luck, the first four bytes are legal UTF-8 and so are not recoded. The next byte, 0xBC, would be a legal non-initial byte in a multi-byte UTF-8 encoded string, but appears by itself so it is converted to the Unicode error character U+FFFD, which is encoded in UTF-8 as EF BF BD. This pushes the legal UTF-8 characters 21, 03 and 4A to later in the string by two bytes due to BC being converted to EF BF BD. The same thing happens with the 0x99 byte, pushing the 4B 37 46 sequence to later in the string. The mangled string is then truncated to 16 bytes, producing this:
It is likely that this problem was not seen when this code was written because it was originally written to work with an older version of the V8 engine that did not validate UTF-8 strings. Storing the data "as-is" rather than as UTF-8 should fix this. Thanks again for the report! Tad |