Description
If an error message truncates a string in such a way that it is no longer valid UTF-8, instead of raising an Mongo::Error::OperationFailure (or other exception) an EncodingError gets raised. This happens when an error message gets truncated on a byte in the middle of a UTF-8 character.
Example:
class MyDocument |
include Mongoid::Document
|
field :name, type: String
|
index({name: 1}, {unique: true}) |
end
|
|
MyDocument.create_indexes
|
|
MyDocument.collection.insert_one({name: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) |
# this raises |
# EncodingError (String E11000 duplicate key error collection: my_db.my_documents index: name_1 dup key: { : "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□?..." } is not valid UTF-8: bogus high bits for continuation byte) |
# the truncation fell in the middle of a ° character
|
MyDocument.collection.insert_one({name: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) |
|
MyDocument.collection.insert_one({name: "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) |
# this raises |
# Mongo::Error::OperationFailure (E11000 duplicate key error collection: my_db.my_documents index: name_1 dup key: { : "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□..." } (11000) (on 127.0.0.1:27017, legacy retry, attempt 1)) |
# which is expected
|
MyDocument.collection.insert_one({name: "a(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"}) |
|
|
|
|
Attachments
Issue Links
- is caused by
-
SERVER-24007 Server can return invalid UTF8 for error messages due to truncation in the middle of a code point
-
- Backlog
-
-
SERVER-55442 Server returns invalid utf-8 in duplicate key error message after truncating user input
-
- Closed
-
- is related to
-
DRIVERS-2008 Default to lossy/replacement behavior when decoding UTF-8 in writeErrors
-
- Backlog
-
- links to