[CDRIVER-3390] Support multi-byte UTF-8 characters as percent-encoded escapes in URI Created: 13/Oct/19 Updated: 28/Oct/23 Resolved: 16/Jan/20 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc, uri |
| Affects Version/s: | None |
| Fix Version/s: | 1.16.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaitlin Mahar | Assignee: | Kevin Albertson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Description |
|
In the auth spec tests, step 4, there are various test cases defined to test SASLprep behavior. Examples are given of how usernames and passwords containing unicode characters that need to be UTF-8 encoded and escaped would be specified in a connection string, e.g.
It seems that since these cases are in the spec libmongoc should be able to successfully parse all these URIs. However, these cases yield the following errors, respectively:
libmongoc does seem to implement corresponding tests that set these usernames and password via setters (see https://github.com/mongodb/mongo-c-driver/blob/116cde0c2ff07f59ee0afa7831577700d4beb026/src/libmongoc/tests/test-mongoc-scram.c#L515) but I cannot find tests setting them via the connection string. |
| Comments |
| Comment by Githook User [ 16/Jan/20 ] | |
|
Author: {'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}Message: | |
| Comment by Kevin Albertson [ 15/Jan/20 ] | |
|
libmongoc can accept multi-byte UTF-8 sequences directly embedded in a URI. E.g.
Where \xE2 etc. represent literal byte values. Directly embedding is permitted, but not required, by the connection string spec's Q&A:
However, the percent-encoded escapes only permit characters matching the isprint function. So URIs like mongodb://IX:I%C2%ADX@mongodb.example.com/admin are rejected since %C2%AD represents a two byte UTF-8 character. I believe the correct behavior is to support percent-encoded multi-byte UTF-8 characters. This would be consistent with other drivers and RFC-3986. |