Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-81191

Checking array field names in IDL is expensive

    • Service Arch
    • Fully Compatible
    • Service Arch 2023-10-02, Service Arch 2023-10-16, Service Arch 2023-10-30, Service Arch 2023-11-13

      When we parse a BSON array in IDL, we check that all the field names are as expected using the general purpose NumberParser class, and for larger arrays this can take significant time. For an example, here is the parsing of an oplog entry using code from SERVER-81101 with NumberParser:

      --------------------------------------------------------------------------------------
      Benchmark                                            Time             CPU   Iterations
      --------------------------------------------------------------------------------------
      BM_ParseOplogEntryWithNoStatementId                204 ns          204 ns      2681374
      BM_ParseOplogEntryWithOneStatementId               239 ns          239 ns      2939634
      BM_ParseOplogEntryWithMultiStatementId/2           328 ns          328 ns      2138516
      BM_ParseOplogEntryWithMultiStatementId/8           498 ns          498 ns      1405159
      BM_ParseOplogEntryWithMultiStatementId/64         1985 ns         1985 ns       352795
      BM_ParseOplogEntryWithMultiStatementId/512       14392 ns        14392 ns        48773
      BM_ParseOplogEntryWithMultiStatementId/1000      27760 ns        27760 ns        25179
      

      (The parameter is the number of entries in the statementID array; the first two benchmarks are not using arrays.)

      Here is the same benchmark with all array fieldname checking removed

      --------------------------------------------------------------------------------------
      Benchmark                                            Time             CPU   Iterations
      --------------------------------------------------------------------------------------
      BM_ParseOplogEntryWithNoStatementId                201 ns          201 ns      2642658
      BM_ParseOplogEntryWithOneStatementId               236 ns          236 ns      2963950
      BM_ParseOplogEntryWithMultiStatementId/2           287 ns          287 ns      2437392
      BM_ParseOplogEntryWithMultiStatementId/8           399 ns          399 ns      1756477
      BM_ParseOplogEntryWithMultiStatementId/64         1035 ns         1035 ns       676153
      BM_ParseOplogEntryWithMultiStatementId/512        6088 ns         6088 ns       115098
      BM_ParseOplogEntryWithMultiStatementId/1000      11527 ns        11526 ns        60909
      

      And here is the code using the C++ "std::from_chars" method

      --------------------------------------------------------------------------------------
      Benchmark                                            Time             CPU   Iterations
      --------------------------------------------------------------------------------------
      BM_ParseOplogEntryWithNoStatementId                204 ns          204 ns      2627978
      BM_ParseOplogEntryWithOneStatementId               237 ns          237 ns      2951225
      BM_ParseOplogEntryWithMultiStatementId/2           298 ns          298 ns      2350274
      BM_ParseOplogEntryWithMultiStatementId/8           401 ns          401 ns      1743373
      BM_ParseOplogEntryWithMultiStatementId/64         1138 ns         1138 ns       614838
      BM_ParseOplogEntryWithMultiStatementId/512        7032 ns         7032 ns        99665
      BM_ParseOplogEntryWithMultiStatementId/1000      13535 ns        13534 ns        51844
      

      I tried a few other things like encoding the expected field number and comparing that, and incrementing the expected field number represented as a string; they weren't faster than from_chars.

      These timings are on my Intel workstation so not too precise, but the differences are significant.

      Ideally we wouldn't even have field names in BSON arrays but I think that ship has long since sailed.

            Assignee:
            patrick.freed@mongodb.com Patrick Freed
            Reporter:
            matthew.russotto@mongodb.com Matthew Russotto
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: