Details
-
Improvement
-
Resolution: Fixed
-
Major - P3
-
None
-
None
-
Service Arch
-
Fully Compatible
-
Service Arch 2023-10-02, Service Arch 2023-10-16, Service Arch 2023-10-30, Service Arch 2023-11-13
Description
When we parse a BSON array in IDL, we check that all the field names are as expected using the general purpose NumberParser class, and for larger arrays this can take significant time. For an example, here is the parsing of an oplog entry using code from SERVER-81101 with NumberParser:
--------------------------------------------------------------------------------------
|
Benchmark Time CPU Iterations
|
--------------------------------------------------------------------------------------
|
BM_ParseOplogEntryWithNoStatementId 204 ns 204 ns 2681374
|
BM_ParseOplogEntryWithOneStatementId 239 ns 239 ns 2939634
|
BM_ParseOplogEntryWithMultiStatementId/2 328 ns 328 ns 2138516
|
BM_ParseOplogEntryWithMultiStatementId/8 498 ns 498 ns 1405159
|
BM_ParseOplogEntryWithMultiStatementId/64 1985 ns 1985 ns 352795
|
BM_ParseOplogEntryWithMultiStatementId/512 14392 ns 14392 ns 48773
|
BM_ParseOplogEntryWithMultiStatementId/1000 27760 ns 27760 ns 25179
|
(The parameter is the number of entries in the statementID array; the first two benchmarks are not using arrays.)
Here is the same benchmark with all array fieldname checking removed
--------------------------------------------------------------------------------------
|
Benchmark Time CPU Iterations
|
--------------------------------------------------------------------------------------
|
BM_ParseOplogEntryWithNoStatementId 201 ns 201 ns 2642658
|
BM_ParseOplogEntryWithOneStatementId 236 ns 236 ns 2963950
|
BM_ParseOplogEntryWithMultiStatementId/2 287 ns 287 ns 2437392
|
BM_ParseOplogEntryWithMultiStatementId/8 399 ns 399 ns 1756477
|
BM_ParseOplogEntryWithMultiStatementId/64 1035 ns 1035 ns 676153
|
BM_ParseOplogEntryWithMultiStatementId/512 6088 ns 6088 ns 115098
|
BM_ParseOplogEntryWithMultiStatementId/1000 11527 ns 11526 ns 60909
|
And here is the code using the C++ "std::from_chars" method
--------------------------------------------------------------------------------------
|
Benchmark Time CPU Iterations
|
--------------------------------------------------------------------------------------
|
BM_ParseOplogEntryWithNoStatementId 204 ns 204 ns 2627978
|
BM_ParseOplogEntryWithOneStatementId 237 ns 237 ns 2951225
|
BM_ParseOplogEntryWithMultiStatementId/2 298 ns 298 ns 2350274
|
BM_ParseOplogEntryWithMultiStatementId/8 401 ns 401 ns 1743373
|
BM_ParseOplogEntryWithMultiStatementId/64 1138 ns 1138 ns 614838
|
BM_ParseOplogEntryWithMultiStatementId/512 7032 ns 7032 ns 99665
|
BM_ParseOplogEntryWithMultiStatementId/1000 13535 ns 13534 ns 51844
|
I tried a few other things like encoding the expected field number and comparing that, and incrementing the expected field number represented as a string; they weren't faster than from_chars.
These timings are on my Intel workstation so not too precise, but the differences are significant.
Ideally we wouldn't even have field names in BSON arrays but I think that ship has long since sailed.
Attachments
Issue Links
- causes
-
SERVER-82983 Fix ambiguity formatting DecimalCounter using libfmt in bsonelement.cpp
-
- Closed
-