[SERVER-55442] Server returns invalid utf-8 in duplicate key error message after truncating user input Created: 23/Mar/21 Updated: 05/Apr/21 Resolved: 05/Apr/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.3, 4.9.0-alpha4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Oleg Pudeyev (Inactive) | Assignee: | David Storch |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | Repro: https://github.com/p-mongo/tests/tree/master/ruby-2560 |
||||||||||||||||
| Sprint: | Query Execution 2021-04-19 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
When a unique index is defined on a collection, and data is inserted that contains duplicates, the server includes an excerpt of the duplicating data into the error message. When the data being inserted is multi-byte utf-8, it appears that the server truncates the data without regard for utf-8 characters. When the truncated data is incorporated into the error message, the entire string is no longer valid utf-8. Test code in Ruby:
The error message returned is:
The libbson utf-8 validator that the Ruby driver uses complains about it thusly:
The error message is returned as a BSON string, which according to my understanding of http://bsonspec.org/spec.html must contain valid utf-8 characters. This was reported in https://jira.mongodb.org/browse/RUBY-2560. I verified against 2.6.12, 4.4.3 and 4.9.0-alpha5 servers. |
| Comments |
| Comment by David Storch [ 05/Apr/21 ] |
|
Closing as a duplicate of SERVER-24007. |
| Comment by Kyle Suarez [ 30/Mar/21 ] |
|
Assigning to david.storch to see if this ticket is a duplicate of another ticket. In any case, this may be a good candidate to nominate to the quick win bucket? |
| Comment by Oleg Pudeyev (Inactive) [ 23/Mar/21 ] |
|
This issue is difficult to work around in the driver because the driver parses the entire response rather than the error message individually. In an environment which validates utf-8 strings, the driver would have to parse the entire response while fixing invalid utf-8, which could return wrong data to the applications. I elaborated on this in https://jira.mongodb.org/browse/RUBY-2560?focusedCommentId=3679006&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-3679006. |