[CSHARP-996] ERROR Unable to translate bytes at index N from specified code page to Unicode. Created: 16/Jun/14  Updated: 04/Apr/15  Resolved: 04/Apr/15

Status: Closed
Project: C# Driver
Component/s: API, BSON
Affects Version/s: 1.8.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Maxim Sidorenko Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongo DB version : 2.6.1
OS : Windows 8



 Description   

We have faced with following exception trying to read from DB, and we can not reproduce it once more. All read\write operations to Mongo DB are perfomed with C# MongoDB driver 1.8.3, Mongo DB version : 2.6.1.

ERROR Unable to translate bytes [AB] at index 106 from specified code page to Unicode.
Exception: System.Text.DecoderFallbackException
Message: Unable to translate bytes [AB] at index 106 from specified code page to Unicode.
Source: mscorlib
at System.Text.DecoderExceptionFallbackBuffer.Throw(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderExceptionFallbackBuffer.Fallback(Byte[] bytesUnknown, Int32 index)
at System.Text.DecoderFallbackBuffer.InternalFallback(Byte[] bytes, Byte* pBytes)
at System.Text.UTF8Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
at System.String.CreateStringFromEncoding(Byte* bytes, Int32 byteLength, Encoding encoding)
at System.Text.UTF8Encoding.GetString(Byte[] bytes, Int32 index, Int32 count)
at MongoDB.Bson.IO.BsonBuffer.DecodeUtf8String(UTF8Encoding encoding, Byte[] buffer, Int32 index, Int32 count)
at MongoDB.Bson.IO.BsonBuffer.ReadString(UTF8Encoding encoding)
at MongoDB.Bson.IO.BsonBinaryReader.ReadString()
at MongoDB.Bson.Serialization.Serializers.BsonStringSerializer.Deserialize(BsonReader bsonReader, Type nominalType, Type actualType, IBsonSerializationOptions options)
at MongoDB.Bson.Serialization.Serializers.BsonBaseSerializer.Deserialize(BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options)
at MongoDB.Bson.Serialization.Serializers.BsonValueSerializer.Deserialize(BsonReader bsonReader, Type nominalType, Type actualType, IBsonSerializationOptions options)
at MongoDB.Bson.Serialization.Serializers.BsonBaseSerializer.Deserialize(BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options)
at MongoDB.Bson.Serialization.Serializers.BsonDocumentSerializer.Deserialize(BsonReader bsonReader, Type nominalType, Type actualType, IBsonSerializationOptions options)
at MongoDB.Bson.Serialization.Serializers.BsonBaseSerializer.Deserialize(BsonReader bsonReader, Type nominalType, IBsonSerializationOptions options)
at MongoDB.Driver.Internal.MongoReplyMessage`1.ReadFrom(BsonBuffer buffer, IBsonSerializationOptions serializationOptions)
at MongoDB.Driver.Internal.MongoConnection.ReceiveMessage[TDocument](BsonBinaryReaderSettings readerSettings, IBsonSerializer serializer, IBsonSerializationOptions serializationOptions)
at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
at MongoDB.Driver.Operations.QueryOperation`1.<Execute>d__0.MoveNext()
at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
at MongoDB.Driver.MongoCollection.FindOneAs[TDocument](IMongoQuery query)
at MongoDB.Driver.MongoCollection.FindOneByIdAs[TDocument](BsonValue id)



 Comments   
Comment by Robert Stam [ 16/Jun/14 ]

String data is stored in MongoDB as UTF8 strings. This exception occurs when the driver is converting a UTF8 encoded string to Unicode (the in-memory representation used by a C# string).

Somehow a document with invalid UTF8 has been introduced to the database. If all documents were inserted using the C# driver this should not be possible, since the C# driver uses the same UTF8 encoder to encode the stored UTF8 strings as to decode them.

The last time I encountered this issue the problematic UTF8 data had been inserted into the database using mongoimport.

If necessary you can configure the C# driver to be more lenient in its UTF8 decoding. You can do that by setting the ReadEncoding property in either MongoClientSettings, MongoDatabaseSettings or MongoCollectionSettings.

By default the C# driver uses a strict encoder (which is why you are getting this exception when the UTF8 data is invalid). You can configure a lenient encoder like this:

var throwOnInvalidBytes = false;
clientSettings.ReadEncoding = new UTF8Encoding(false, throwOnInvalidBytes);

A lenient encoding will do the best it can to decode UTF8, even if it is invalid. Any invalid UTF8 bytes are decoded to a special Unicode character that represents an "invalid character".

Keep in mind that if you use a lenient encoding and you read a document containing invalid UTF8 and then save it back to the database the UTF8 encoding will change slightly (from the original invalid UTF8 data to a valid UTF8 representation of the "invalid character" Unicode code point).

Generated at Wed Feb 07 21:38:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.