Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Duplicate
-
4.4.3, 4.9.0-alpha4
-
None
-
None
-
None
-
ALL
-
-
Query Execution 2021-04-19
Description
When a unique index is defined on a collection, and data is inserted that contains duplicates, the server includes an excerpt of the duplicating data into the error message.
When the data being inserted is multi-byte utf-8, it appears that the server truncates the data without regard for utf-8 characters. When the truncated data is incorporated into the error message, the entire string is no longer valid utf-8.
Test code in Ruby:
require 'mongo' |
|
client = Mongo::Client.new(['localhost:14400']) |
|
client['foo'].drop |
client['foo'].indexes.create_one({k: 1}, unique: true) |
|
rep = '(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻' |
|
client['foo'].insert_one(k: rep*10) |
client['foo'].insert_one(k: rep*10) |
The error message returned is:
E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." }
|
The libbson utf-8 validator that the Ruby driver uses complains about it thusly:
/home/w/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/bson-4.12.0/lib/bson/hash.rb:111:in `get_hash': String E11000 duplicate key error collection: admin.foo index: k_1 dup key: { k: "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□�..." } is not valid UTF-8: bogus high bits for continuation byte (EncodingError)
|
The error message is returned as a BSON string, which according to my understanding of http://bsonspec.org/spec.html must contain valid utf-8 characters.
This was reported in https://jira.mongodb.org/browse/RUBY-2560. I verified against 2.6.12, 4.4.3 and 4.9.0-alpha5 servers.
Attachments
Issue Links
- causes
-
RUBY-2560 EncodingError raised when server returns invalid UTF-8 in error messages derived from user input
-
- Backlog
-
- duplicates
-
SERVER-24007 Server can return invalid UTF8 for error messages due to truncation in the middle of a code point
-
- Backlog
-