Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6175

tcmalloc fragmentation is worse in 4.4 with durable history

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.4.0-rc4
    • Fix Version/s: WT10.0.0, 4.4.0-rc10, 4.7.0
    • Component/s: None
    • Labels:
    • Case:

      Description

      Issue Status as of Nov 2, 2020

      ISSUE DESCRIPTION AND IMPACT

      The accumulation of many small data structures (typically associated with inserts and updates) in the WiredTiger cache can cause the system's memory allocator to use more space than is requested by WiredTiger. Historically, the main mechanism for addressing the impact of fragmentation has been to limit the amount of dirty data that can accumulate in the cache to 20%. The precise limit can be controlled using the eviction_dirty_trigger configuration option..

      However, some WiredTiger cache pages with many associated small memory allocations can remain in cache after a checkpoint and be marked as clean pages. The clean/dirty distinction helps limit the amount of work done in checkpoints, but is in this way an estimate of memory allocator fragmentation.

      With the introduction of durable history in MongoDB 4.4, it is more common that small memory allocations associated with these small objects are contributing more to fragmentation than in previous versions.

      To address this, we are now:

      • Tracking insert and update data structures as a separate attribute of cache usage.
      • Extending the cache eviction process to manage the proportion of cache associated with small allocations, similarly to how it manages clean and dirty content.
      • Adding a configurable trigger (eviction_updates_trigger) on the amount of small objects in the cache, to prompt eviction of that content. The default value is eviction_dirty_trigger / 2 (10%).
      • Adding a configurable target (eviction_updates_target) to serve as a goal for the eviction process. The default value is eviction_dirty_target / 2 (10%).

      DIAGNOSIS AND AFFECTED VERSIONS

      This change is introduced in WT3.2.2, MongoDB 4.4+.

      A deployment running with the default configuration and servicing workloads that generate a large number of small objects may be governed more by the new dirty triggers than the generic dirty triggers. If this occurs you will notice that cache dirty % tends more toward the eviction_updates_target of 10% rather than the eviction_dirty_target of 20%.

      REMEDIATION AND WORKAROUNDS

      These changes in eviction behavior are expected and should be evaluated in the context of how clients of the MongoDB server are affected, if at all.

      original description

      This isn't new with 4.4.0-rc4, it has been an issue in all of the 4.4 release candidates I tried. HELP-13660 has a possible explanation for the trigger: 1) modify many documents and then 2) do queries that require long-running scans.

      My test case is Linkbench with a large database. The workload is 1) load the database 2) create a secondary index on one of the collections and 3) run transactions. The problem happens at step 2 which does a scan during create index. The test database is ~200G with Snappy compression and WiredTiger has cacheSizeGB=40.

      I dump tcmalloc stats after each step. Much more detail is here and the summary is listed below.

      For 4.4.0-rc4, VSZ for the mongod process is ~9G larger after create index compared to VSZ for 4.2.6 or 4.4 prior to the durable history merge.

      This can be reproduced with Linkbench2 that is in DSI, although:
      1) that will have to be changed to create the secondary index after the load.
      2) I use maxid1=200M while the code in DSI now uses maxid1=10M

      I am not sure whether Henrik added a repro to DSI for this when he did the work leading to HELP-13660

        Attachments

        1. 3stacks.png
          3stacks.png
          155 kB
        2. comparison.png
          comparison.png
          177 kB
        3. fragmentation.png
          fragmentation.png
          156 kB
        4. growth.png
          growth.png
          245 kB
        5. hpe.426.tar
          44.26 MB
        6. linkbench-10G.png
          linkbench-10G.png
          529 kB
        7. metrics.2020-05-08T14-09-24Z-00000.r1
          9.93 MB
        8. metrics.2020-05-08T20-17-36Z-00000.r1
          10.00 MB
        9. metrics.2020-05-09T00-53-05Z-00000.r1
          621 kB
        10. metrics.interim
          190 kB
        11. metrics.interim.r1
          22 kB
        12. repro-32-5G.png
          repro-32-5G.png
          367 kB
        13. wt6175.lb200m.may14.tar
          48.80 MB

          Issue Links

            Activity

              People

              Assignee:
              michael.cahill Michael Cahill
              Reporter:
              mark.callaghan Mark Callaghan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: