[CDRIVER-2401] Handle UTF-8 multibyte NIL in bson_utf8_validate, and UTF-8 validate URI strings before parsing Created: 21/Nov/17  Updated: 28/Oct/23  Resolved: 23/Nov/17

Status: Closed
Project: C Driver
Component/s: uri
Affects Version/s: None
Fix Version/s: 1.9.0

Type: Bug Priority: Minor - P4
Reporter: Stuart Larsen (Inactive) Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: asp, asp-sdl-fuzzing, asp-vuln-dos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to CDRIVER-2403 Does libbson implement UTF-8 or CESU-8? Closed

 Description   

Bugs

Three minor issues if you feed the following PoCs into the "mongoc_uri_new" function.

This was against:
https://github.com/mongodb/mongo-c-driver/releases/download/1.8.2/mongo-c-driver-1.8.2.tar.gz

With ASAN on.

This is the script I used for testing:
https://gist.github.com/c0nrad/760fd1d34e39b7ed8f4442c622c90160

scan_to_unichar

READ of size 1
#7 0x000000000041c2ec in scan_to_unichar (terminators=<optimized out>, end=<synthetic pointer>, match=64, str=0x60200000ec50 "\350\003") at src/mongoc/mongoc-uri.c:159
PoC
0000000 6f6d 676e 646f 3a62 2f2f 03e8 0000 686c
0000010 736f 3a74 3732 3130 2f37 6574 7473 723f
0000020 7065 696c 6163 6573 3d74 6f66 006f
000002d

bson_utf8_get_char

READ of size 1
#7 0x00000000004763db in bson_utf8_get_char (utf8=utf8@entry=0x60200000ec30 "\372") at src/bson/bson-utf8.c:367
PoC:
0000000 6f6d 676e 646f 3a62 2f2f 00fa fa00 686c
0000010 736f 3a74 3732 3130 2f37 6574 7473 723f
0000020 7065 696c 6163 6573 3d74 6f66 006f
000002d

bson_string_append_unichar

precondition failed: unichar
#2 0x0000000000471ed2 in bson_string_append_unichar (string=string@entry=0x60200000ebf0, unichar=<optimized out>) at src/bson/bson-string.c:232
#3 0x0000000000412529 in mongoc_uri_unescape (escaped_string=escaped_string@entry=0x60200000ec10 "loca01te\332\213\300\200") at src/mongoc/mongoc-uri.c:1683
#4 0x0000000000412eff in mongoc_uri_do_unescape (str=<synthetic pointer>) at src/mongoc/mongoc-uri.c:76
#5 mongoc_uri_parse_host (uri=<optimized out>, str=<optimized out>, downcase=<optimized out>) at src/mongoc/mongoc-uri.c:367
PoC:
0000000 6f6d 676e 646f 3a62 2f2f 6f6c 6163 3130
0000010 6574 8bda 80c0 ff00 31ff 6574 8bda 8dc0
0000020 4063 6573 3d74 6f66 7361 0073
000002b



 Comments   
Comment by Githook User [ 22/Nov/17 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: CDRIVER-2401 delete temporary comment
Branch: master
https://github.com/mongodb/libbson/commit/155ad7c7a676f531ec10bdaec33c7f8e04a59fc1

Comment by Githook User [ 22/Nov/17 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: CDRIVER-2401 test ASAN with GCC, as well as Clang
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/176b6643a6e801a24668e00f884700db0dbad581

Comment by Githook User [ 22/Nov/17 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: CDRIVER-2401 validate whole URI as UTF-8
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/f4e8af4f80d2b20912a6c96407e1c20fe798c5f7

Comment by A. Jesse Jiryu Davis [ 22/Nov/17 ]

The first string begins with "mongodb://\xe8\x03\x00". The "\xe8" should be the first byte of a three-byte character, but there aren't enough characters left in the string. "\x00" terminates the string - no UTF-8 multibyte character includes the zero byte. Unfortunately, mongoc_uri_parse tries to iterate over each UTF-8 character after "mongodb://", searching for a "/" character, and it steps past the end of the string. I fixed it by simply UTF-8 validating the whole string in mongoc_uri_parse before splitting the string into URI segments.

The second string begins with "mongodb://\xfa". In libbson we interpret that as the first byte of a *five*-byte character! I think this means libbson doesn't implement strict UTF-8, which only allows up to 4 bytes per character. Perhaps it implements CESU-8 but I haven't investigated. Anyway, the "\xfa" is at the end of the string, so this is the same bug with the same fix as the previous entry.

The third contains "\xc0\x80" in the hostname. That's a multibyte synonym for NIL. We try to unescape the hostname in case it contains "%20" or something like that. This involves stepping over each unicode character, checking if it is "%", and if not, then appending it to the actual hostname, using bson_string_append_unichar(). That function asserts the appended character is non-NIL, but "\xc0\x80" is NIL.

The solution here is to update the behavior of this function:

bson_utf8_validate (str, strlen (str), false /* allow_null */)

The validate function should prohibit multibyte NIL the same as single-byte NIL if "allow_null" is false. (Yes, I'm confusing NULL and NIL, sorry.)

Comment by Githook User [ 22/Nov/17 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: CDRIVER-2401 check for UTF-8 two-byte NULL

bson_utf8_validate() with allow_null=false should prohibit the UTF-8
two-byte code for NULL, as well as the single-byte NULL.
Branch: master
https://github.com/mongodb/libbson/commit/b4bcd00967706502e53c948e740a2c503e2c6f79

Comment by A. Jesse Jiryu Davis [ 21/Nov/17 ]

This is terrific Stuart. What's a PoC, is it a proof of concept?

Is this required before we release 1.9.0. the first release that supports mongodb+srv URIs?

Generated at Wed Feb 07 21:15:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.