[SERVER-81876] Improve IDL code generation for command parsers Created: 04/Oct/23  Updated: 24/Jan/24  Resolved: 11/Jan/24

Status: Closed
Project: Core Server
Component/s: IDL
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Dominic Hernandez
Resolution: Won't Do Votes: 1
Labels: perf-8.0, perf-tiger, perf-tiger-handoff, perf-tiger-q4
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-82940 Use tries for IDL enumeration lookups Closed
Related
related to SERVER-83671 Implement lookup tries for IDL parse ... In Progress
related to SERVER-82940 Use tries for IDL enumeration lookups Closed
is related to SERVER-78439 Investigate IDL generated parser perf... Closed
Assigned Teams:
Service Arch
Sprint: Service Arch 2024-01-22
Participants:

 Description   

Let's take the "find" command as an example. The command is defined here, and includes 33 fields at the time of writing this ticket. The generated parser uses a series of if / else if conditional statements to parse a BSON, by going through every element in the object:

void parseProtected(const IDLParserContext&, const OpMsgRequest& request) {
    ...
    for (const auto& element : request.body) {
        ...
        if (fieldName == kFilterFieldName) { ... }
        else if (fieldName == kProjectionFieldName) { ... }
        else if (fieldName == kSortFieldName) { ... }
        ...
    }
}

This will translate into multiple string comparisons for each element in the BSON object (on average 16 string comparisons). As a result, parsing commands shows up as a hot-spot in performance profiles, especially for low-latency workloads (e.g. YCSB 100% read). The performance implications scales with the number of fields in a command, and the order of if/else statements.

We can reduce the cost of parsing commands (both number of instructions and CPI) by either of the following:

  • Reordering if/else statements based on the likelihood of taking each branch to reduce the total number of string comparisons.
  • Avoid parsing the BSON object multiple times if possible (less pressure on the cache).
  • Hashing filed names at compile-time, and using the hashed values to parse commands (my PoC).

Here are some performance numbers for my PoC using micro-benchmarks:

Running build/install/bin/find_command_bm
Run on (8 X 3501.93 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1280 KiB (x4)
  L3 Unified 55296 KiB (x1)
Load Average: 0.86, 0.77, 0.54
--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
BM_ParseFindCommand                510 ns          510 ns      1373844 items_per_second=1.96008M/s
BM_ParseHeavyFindCommand           566 ns          566 ns      1262140 items_per_second=1.76706M/s
BM_ParseFindCommandPoC             287 ns          287 ns      2440443 items_per_second=3.48677M/s
BM_ParseHeavyFindCommandPoC        232 ns          232 ns      3011081 items_per_second=4.30192M/s



 Comments   
Comment by Jason Chan [ 11/Jan/24 ]

Re-closing this ticket in-lieu of SERVER-83671

Comment by Jason Chan [ 29/Nov/23 ]

amirsaman.memaripour@mongodb.com do we think it makes sense to add this to PM-3598?

Generated at Thu Feb 08 06:47:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.