-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Service Arch
-
Fully Compatible
-
Service Arch 2023-10-02, Service Arch 2023-10-16, Service Arch 2023-10-30, Service Arch 2023-11-13
When we parse a BSON array in IDL, we check that all the field names are as expected using the general purpose NumberParser class, and for larger arrays this can take significant time. For an example, here is the parsing of an oplog entry using code from SERVER-81101 with NumberParser:
-------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_ParseOplogEntryWithNoStatementId 204 ns 204 ns 2681374 BM_ParseOplogEntryWithOneStatementId 239 ns 239 ns 2939634 BM_ParseOplogEntryWithMultiStatementId/2 328 ns 328 ns 2138516 BM_ParseOplogEntryWithMultiStatementId/8 498 ns 498 ns 1405159 BM_ParseOplogEntryWithMultiStatementId/64 1985 ns 1985 ns 352795 BM_ParseOplogEntryWithMultiStatementId/512 14392 ns 14392 ns 48773 BM_ParseOplogEntryWithMultiStatementId/1000 27760 ns 27760 ns 25179
(The parameter is the number of entries in the statementID array; the first two benchmarks are not using arrays.)
Here is the same benchmark with all array fieldname checking removed
-------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_ParseOplogEntryWithNoStatementId 201 ns 201 ns 2642658 BM_ParseOplogEntryWithOneStatementId 236 ns 236 ns 2963950 BM_ParseOplogEntryWithMultiStatementId/2 287 ns 287 ns 2437392 BM_ParseOplogEntryWithMultiStatementId/8 399 ns 399 ns 1756477 BM_ParseOplogEntryWithMultiStatementId/64 1035 ns 1035 ns 676153 BM_ParseOplogEntryWithMultiStatementId/512 6088 ns 6088 ns 115098 BM_ParseOplogEntryWithMultiStatementId/1000 11527 ns 11526 ns 60909
And here is the code using the C++ "std::from_chars" method
-------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_ParseOplogEntryWithNoStatementId 204 ns 204 ns 2627978 BM_ParseOplogEntryWithOneStatementId 237 ns 237 ns 2951225 BM_ParseOplogEntryWithMultiStatementId/2 298 ns 298 ns 2350274 BM_ParseOplogEntryWithMultiStatementId/8 401 ns 401 ns 1743373 BM_ParseOplogEntryWithMultiStatementId/64 1138 ns 1138 ns 614838 BM_ParseOplogEntryWithMultiStatementId/512 7032 ns 7032 ns 99665 BM_ParseOplogEntryWithMultiStatementId/1000 13535 ns 13534 ns 51844
I tried a few other things like encoding the expected field number and comparing that, and incrementing the expected field number represented as a string; they weren't faster than from_chars.
These timings are on my Intel workstation so not too precise, but the differences are significant.
Ideally we wouldn't even have field names in BSON arrays but I think that ship has long since sailed.
- causes
-
SERVER-82983 Fix ambiguity formatting DecimalCounter using libfmt in bsonelement.cpp
- Closed