|
Let's take the "find" command as an example. The command is defined here, and includes 33 fields at the time of writing this ticket. The generated parser uses a series of if / else if conditional statements to parse a BSON, by going through every element in the object:
void parseProtected(const IDLParserContext&, const OpMsgRequest& request) {
|
...
|
for (const auto& element : request.body) {
|
...
|
if (fieldName == kFilterFieldName) { ... }
|
else if (fieldName == kProjectionFieldName) { ... }
|
else if (fieldName == kSortFieldName) { ... }
|
...
|
}
|
}
|
This will translate into multiple string comparisons for each element in the BSON object (on average 16 string comparisons). As a result, parsing commands shows up as a hot-spot in performance profiles, especially for low-latency workloads (e.g. YCSB 100% read). The performance implications scales with the number of fields in a command, and the order of if/else statements.
We can reduce the cost of parsing commands (both number of instructions and CPI) by either of the following:
- Reordering if/else statements based on the likelihood of taking each branch to reduce the total number of string comparisons.
- Avoid parsing the BSON object multiple times if possible (less pressure on the cache).
- Hashing filed names at compile-time, and using the hashed values to parse commands (my PoC).
Here are some performance numbers for my PoC using micro-benchmarks:
Running build/install/bin/find_command_bm
|
Run on (8 X 3501.93 MHz CPU s)
|
CPU Caches:
|
L1 Data 48 KiB (x4)
|
L1 Instruction 32 KiB (x4)
|
L2 Unified 1280 KiB (x4)
|
L3 Unified 55296 KiB (x1)
|
Load Average: 0.86, 0.77, 0.54
|
--------------------------------------------------------------------------------------
|
Benchmark Time CPU Iterations UserCounters...
|
--------------------------------------------------------------------------------------
|
BM_ParseFindCommand 510 ns 510 ns 1373844 items_per_second=1.96008M/s
|
BM_ParseHeavyFindCommand 566 ns 566 ns 1262140 items_per_second=1.76706M/s
|
BM_ParseFindCommandPoC 287 ns 287 ns 2440443 items_per_second=3.48677M/s
|
BM_ParseHeavyFindCommandPoC 232 ns 232 ns 3011081 items_per_second=4.30192M/s
|
|