[CDRIVER-2401] Handle UTF-8 multibyte NIL in bson_utf8_validate, and UTF-8 validate URI strings before parsing Created: 21/Nov/17 Updated: 28/Oct/23 Resolved: 23/Nov/17 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | uri |
| Affects Version/s: | None |
| Fix Version/s: | 1.9.0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Stuart Larsen (Inactive) | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | asp, asp-sdl-fuzzing, asp-vuln-dos | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
BugsThree minor issues if you feed the following PoCs into the "mongoc_uri_new" function. This was against: With ASAN on. This is the script I used for testing: scan_to_unichar
bson_utf8_get_char
bson_string_append_unichar
|
| Comments |
| Comment by Githook User [ 22/Nov/17 ] | |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: | |
| Comment by Githook User [ 22/Nov/17 ] | |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: | |
| Comment by Githook User [ 22/Nov/17 ] | |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: | |
| Comment by A. Jesse Jiryu Davis [ 22/Nov/17 ] | |
|
The first string begins with "mongodb://\xe8\x03\x00". The "\xe8" should be the first byte of a three-byte character, but there aren't enough characters left in the string. "\x00" terminates the string - no UTF-8 multibyte character includes the zero byte. Unfortunately, mongoc_uri_parse tries to iterate over each UTF-8 character after "mongodb://", searching for a "/" character, and it steps past the end of the string. I fixed it by simply UTF-8 validating the whole string in mongoc_uri_parse before splitting the string into URI segments. The second string begins with "mongodb://\xfa". In libbson we interpret that as the first byte of a *five*-byte character! I think this means libbson doesn't implement strict UTF-8, which only allows up to 4 bytes per character. Perhaps it implements CESU-8 but I haven't investigated. Anyway, the "\xfa" is at the end of the string, so this is the same bug with the same fix as the previous entry. The third contains "\xc0\x80" in the hostname. That's a multibyte synonym for NIL. We try to unescape the hostname in case it contains "%20" or something like that. This involves stepping over each unicode character, checking if it is "%", and if not, then appending it to the actual hostname, using bson_string_append_unichar(). That function asserts the appended character is non-NIL, but "\xc0\x80" is NIL. The solution here is to update the behavior of this function:
The validate function should prohibit multibyte NIL the same as single-byte NIL if "allow_null" is false. (Yes, I'm confusing NULL and NIL, sorry.) | |
| Comment by Githook User [ 22/Nov/17 ] | |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: bson_utf8_validate() with allow_null=false should prohibit the UTF-8 | |
| Comment by A. Jesse Jiryu Davis [ 21/Nov/17 ] | |
|
This is terrific Stuart. What's a PoC, is it a proof of concept? Is this required before we release 1.9.0. the first release that supports mongodb+srv URIs? |