[CSHARP-27] Text would be broken when receive string value with unicode chars Created: 11/Mar/10  Updated: 15/Mar/10  Resolved: 15/Mar/10

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jeffrey Zhao Assignee: Sam Corder
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mono on Mac OS X (Snow Leopard), MS CLR (v2.0 & 4.0 RC)



 Description   

Issue: Text broken when receive string value with unicode chars.

Reason: There's bug in BsonReader.GetString(int length) method. The method uses a 128-bytes buffer to receive string value. Each 128 bytes would be converted to utf-8 string seperately, but it may be only half of an utf-8 char at the end. So the text would be broken when contains unicode chars.

Hot to fix: Put the complete data received in an StreamMemory before converting to utf-8 string.



 Comments   
Comment by Sam Corder [ 15/Mar/10 ]

Fixed for both fixed length and variable length strings now.

Comment by Sam Corder [ 13/Mar/10 ]

Forgot to add the code to the part that reads a known string length.

Comment by Sam Corder [ 13/Mar/10 ]

Fixed in http://github.com/samus/mongodb-csharp/commit/0e73a20d5cf20190dc210eb7f18089d60d610948. This will be in the 0.82.

Thanks for making me learn more about UTF-8 than I ever wanted to.

Comment by Steve Wagner [ 11/Mar/10 ]

Can you provide an example unicode string and paste it as base64? With that i could integrate it into the test suite.

Generated at Wed Feb 07 21:35:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.