Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47615

Investigate cause of index builds spending more CPU time in WT transaction commit

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0-rc0
    • Component/s: Index Maintenance
    • None
    • Storage Execution
    • ALL

      Update: This regression has been fixed since 4.4.0-rc1 due to SERVER-47407. This ticket will track the investigation of the initial regression.

      By "much slower" I mean that it takes about 2X longer to create indexes for Linkbench in 4.4 compared to 4.2. Linkbench is in DSI, although I need to make a few changes to it so that the secondary index is created after the load as done here.

      By 4.4 I mean both 4.4.0-rc0 and a build from the v4.4 branch prior to the merge of durable history.

      From vmstat and iostat I see that 4.4 uses ~3X more CPU than 4.2. Both 4.2 and 4.4 do a similar amount of IO to storage – both read & write.

      By "IO-bound" I mean that the test server has 16G of RAM and the indexed collection is ~80G per db.$collection.stats and mongod was setup as a single-node replicaset.

      I will provide ftdc and CPU flamegraphs tomorrow. For now I start with a summary of performance data collected during the create index.

      Results are provided for MongoDB versions 4.2.5, 4.4.0-rc0 and 4.4pre, where 4.4pre is a build from the v4.4 branch immediately prior to the merge of durable history.

      The first table is mostly data from iostat and vmstat explained in "Index, Per Operation" here. This is performance data divided by the insert rate to estimate the performance overhead per inserted row.

      From this table the value of "secs" is the number of seconds for the create index and that value is ~3X larger for 4.4pre and 4.4.0rc0 vs 4.2.5. Also the CPU consumption per indexed row (cpupi) is ~3X larger for the 4.4 binaries vs 4.2.5. But the amount of reads to and writes from storage (rkbpi, wkbpi) are similar between 4.4 and 4.2, so this looks like a new CPU problem.

      ips     secs    rpi     rkbpi   wkbpi   cspi    cpupi   csecpq  dsecpq  csec    dsec    dbgb    cnf
      311740  2440    0.001   0.133   0.067   0.0     84      0.0     3.2     0       2421    209.8   mo425.c5
      107679  7064    0.002   0.132   0.066   0.0     239     0.0     9.3     0       7109    219.8   mo44pre.c5
      121296  6271    0.001   0.144   0.066   0.0     218     0.0     8.4     0       6402    264.4   mo440rc0.c5
      

      The next table is mostly data from iostat and vmstat explained in "Index, Per Second" here. Most of these are the average values for the counters from vmstat and iostat. From this I see that the IO rates to and from storage (rmbps, wmbps) are much larger for 4.2.5 and one reason for that is that 4.4 has more CPU overhead in between doing IO.

      ips     secs    rps     rmbps   wmbps   csps    cpups   cutil   dutil   vsz     rss     cnf
      311740  2440    448     40      21      1518    26.1    0.000   0.992   13.9    12.0    mo425.c5
      107679  7064    181     14      7       1670    25.7    0.000   1.006   15.7    13.6    mo44pre.c5
      121296  6271    141     17      8       1911    26.5    0.000   1.021   17.2    14.7    mo440rc0.c5
      

        1. mo425.Apr16.085309.rep
          578 kB
        2. mo425.Apr16.085309.svg
          203 kB
        3. mo425.Apr20.222253.svg
          165 kB
        4. mo440rc0.Apr16.001809.rep
          223 kB
        5. mo440rc0.Apr16.001809.svg
          136 kB
        6. mo440rc0.Apr20.185022.svg
          315 kB
        7. mo440rc1.Apr20.131341.svg
          287 kB
        8. mo44pre.Apr15.180608.svg
          135 kB
        9. mo44pre.Apr21.145849.svg
          170 kB

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            mark.callaghan@mongodb.com Mark Callaghan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: