-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Affects Version/s: None
-
Component/s: BSON
-
None
-
None
-
Ruby Drivers
-
None
-
None
-
None
-
None
-
None
-
None
The Ruby BSON library is implemented by a C extension. Currently, the BSON decoder uses rb_bson_byte_buffer_get_cstring to grab the field name from the byte buffer, but there is evidence that using interned strings will be more performant.
jeff.yemin@mongodb.com provided this patch as a proof of concept:
diff --git a/ext/bson/read.c b/ext/bson/read.c
index 36d96f3..6152000 100644
--- a/ext/bson/read.c
+++ b/ext/bson/read.c
@@ -412,7 +412,15 @@ VALUE rb_bson_byte_buffer_get_hash(int argc, VALUE *argv, VALUE self){
doc = rb_funcall(cDocument, rb_intern("allocate"), 0);
while((type = pvt_get_type_byte(b)) != 0){
- VALUE field = rb_bson_byte_buffer_get_cstring(self);
+ /* Field name: read directly into an interned (deduped, frozen) String
+ * via rb_enc_interned_str so identical field names across documents and
+ * across this document's fields collapse to the same VALUE without a
+ * per-occurrence allocation. Significant on cursor batches where every
+ * doc has the same key set. */
+ int field_len = (int)pvt_strnlen(b);
+ ENSURE_BSON_READ(b, field_len);
+ VALUE field = rb_enc_interned_str(READ_PTR(b), field_len, rb_utf8_encoding());
+ b->read_position += field_len + 1;