[CDRIVER-3390] Support multi-byte UTF-8 characters as percent-encoded escapes in URI Created: 13/Oct/19  Updated: 28/Oct/23  Resolved: 16/Jan/20

Status: Closed
Project: C Driver
Component/s: libmongoc, uri
Affects Version/s: None
Fix Version/s: 1.16.0

Type: Bug Priority: Major - P3
Reporter: Kaitlin Mahar Assignee: Kevin Albertson
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends

 Description   

In the auth spec tests, step 4, there are various test cases defined to test SASLprep behavior.

Examples are given of how usernames and passwords containing unicode characters that need to be UTF-8 encoded and escaped would be specified in a connection string, e.g.

  • mongodb://IX:I%C2%ADX@mongodb.example.com/admin
  • mongodb://%E2%85%A8:IV@mongodb.example.com/admin
  • mongodb://%E2%85%A8:I%C2%ADV@mongodb.example.com/admin

It seems that since these cases are in the spec libmongoc should be able to successfully parse all these URIs. However, these cases yield the following errors, respectively:

  • Incorrect URI escapes in password
  • Incorrect URI escapes in username. Percent-encode username and password according to RFC 3986
  • Incorrect URI escapes in username. Percent-encode username and password according to RFC 3986

libmongoc does seem to implement corresponding tests that set these usernames and password via setters (see https://github.com/mongodb/mongo-c-driver/blob/116cde0c2ff07f59ee0afa7831577700d4beb026/src/libmongoc/tests/test-mongoc-scram.c#L515) but I cannot find tests setting them via the connection string.



 Comments   
Comment by Githook User [ 16/Jan/20 ]

Author:

{'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}

Message: CDRIVER-3390 support multi-byte UTF-8 percent escapes
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/bda6377419b328f36bdbee62161bd953d2257c23

Comment by Kevin Albertson [ 15/Jan/20 ]

libmongoc can accept multi-byte UTF-8 sequences directly embedded in a URI. E.g.

uri = mongoc_uri_new ("mongodb://IX:\xE2\x85\xA8@localhost:27017/admin");

Where \xE2 etc. represent literal byte values. Directly embedding is permitted, but not required, by the connection string spec's Q&A:

Q: Can the connection string contain non-ASCII characters?
The connection string can contain non-ASCII characters. The connection string is text, which can be encoded in any way appropriate for the application (e.g. the C Driver requires you to pass it a UTF-8 encoded connection string).

However, the percent-encoded escapes only permit characters matching the isprint function. So URIs like mongodb://IX:I%C2%ADX@mongodb.example.com/admin are rejected since %C2%AD represents a two byte UTF-8 character.

I believe the correct behavior is to support percent-encoded multi-byte UTF-8 characters. This would be consistent with other drivers and RFC-3986.

Generated at Wed Feb 07 21:17:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.