Explore optimized string comparison strategies for bson::getField()

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

       

           SERVER-120255 optimized bson::getField() in SBE (values/bson.h) by replacing byte-by-byte strcmp with strlen() + length check + memcmp. This improved long-field-name performance (~73% faster for 64-char names) but regressed short-field-name workloads (BF-41932) due to unnecessary strlen() calls on miss paths.

           SERVER-121006 reworks the microbenchmark to be more representative (200 cases across 5 field name sizes, 4 document shapes, 5 presence patterns, 2 value types) and implements an initial hybrid approach.

           This ticket covers three implementation experiments suggested during code review to further optimize the miss path, particularly for short field names.

           Experiment 1: 8-byte load with zero-byte detection

           Load 8 bytes from the BSON field name and use the hasZeroByte bit trick (same technique as getStringLength() in sbe/value.h) to detect whether the field name is shorter than 8 characters. If so, use a specialized fast strlen + memcmp path; otherwise fall back to the standard strlen + memcmp.
           while (at least 8 bytes left)

      {          uint64_t haystackNext = 0;          memcpy(&haystackNext, be + 1, 8);          bool hasZeroByte = ~((((haystackNext & 0x7F7F7F7F7F7F7F7F) + 0x7F7F7F7F7F7F7F7F) | haystackNext) | 0x7F7F7F7F7F7F7F7F);          // short path vs long path based on hasZeroByte      }

           Experiment 2: Integer arithmetic comparison (XOR + mask)

           Instead of separate strlen + memcmp, do a single 8-byte load and compare using integer arithmetic:
           (needle ^ haystack) & ((1 << (len * 8)) - 1) == 0

           This combines length detection and comparison into one load + arithmetic sequence. Trade-off: more ALU work vs. fewer memory operations.

       

      Important: Consider merging with BSONObj::getField() in bsonobj.cpp

      For additional context please see the following PR discussion.

            Assignee:
            Unassigned
            Reporter:
            Catalin Sumanaru
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: