[bson-ruby] Improve decoding performance by interning strings

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: BSON
    • None
    • None
    • Ruby Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      The Ruby BSON library is implemented by a C extension. Currently, the BSON decoder uses rb_bson_byte_buffer_get_cstring to grab the field name from the byte buffer, but there is evidence that using interned strings will be more performant.

      jeff.yemin@mongodb.com provided this patch as a proof of concept:

      diff --git a/ext/bson/read.c b/ext/bson/read.c
      index 36d96f3..6152000 100644
      --- a/ext/bson/read.c
      +++ b/ext/bson/read.c
      @@ -412,7 +412,15 @@ VALUE rb_bson_byte_buffer_get_hash(int argc, VALUE *argv, VALUE self){
         doc = rb_funcall(cDocument, rb_intern("allocate"), 0);
      
         while((type = pvt_get_type_byte(b)) != 0){
      -    VALUE field = rb_bson_byte_buffer_get_cstring(self);
      +    /* Field name: read directly into an interned (deduped, frozen) String
      +     * via rb_enc_interned_str so identical field names across documents and
      +     * across this document's fields collapse to the same VALUE without a
      +     * per-occurrence allocation. Significant on cursor batches where every
      +     * doc has the same key set. */
      +    int field_len = (int)pvt_strnlen(b);
      +    ENSURE_BSON_READ(b, field_len);
      +    VALUE field = rb_enc_interned_str(READ_PTR(b), field_len, rb_utf8_encoding());
      +    b->read_position += field_len + 1;
      

            Assignee:
            Unassigned
            Reporter:
            Jamis Buck
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: