Investigation: clean/dirty-only eviction (remove updates target/trigger) is net-negative

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Engines - Transactions
    • 166.049
    • None
    • None

      Question

      Can WiredTiger eviction key only on clean/dirty content, dropping the "updates" target/trigger dimension entirely? Conclusion: No. Full removal is net-negative; updates eviction is load-bearing for update-heavy workloads.

      Experiment

      Removed the entire updates eviction dimension (branch evict-remove-updates-target):

      • config eviction_updates_target / eviction_updates_trigger
      • flags WT_EVICT_CACHE_UPDATES / WT_EVICT_CACHE_UPDATES_HARD
      • bytes_updates cache/btree/page accounting and related statistics
      • mongo mirror: stop emitting eviction_updates_trigger into wiredtiger_open; drop the updates dimension from wiredtiger_cache_pressure_monitor

      WT pull_request patch is green (required relaxing the test_truncate19 oplog-size bound 600MB -> 650MB: the change raises that transient peak ~575MB -> ~601MB because truncated pages pinned by the long-running txn linger in cache longer).

      Results (sys-perf vs master 7a072ec391d)

      64 significant movers (|z| > 2): 47 regressions, 17 improvements.

      Regressions (worse):

      • tpce_locust latency +40% to +75% (avg and p95), throughput -1% to -3%
      • update-heavy YCSB: in_cache 95read5update update latency +45%, ops -6%
      • ecommerce_locust p50 +7% to +16%; find_one_and_update +9% to +12%

      Improvements (read-resident workloads):

      • linkbench2 latency -6% to -11%; in_cache YCSB read latency -15.8%; bulk_insert p50 -26%; tsbs bulk load +4.6%

      FTDC root-cause (tpce_locust primary, 49-min run)

      metric value reading
      bytes in cache ~11GB avg / 16.5GB peak near-full, leaf-dominated (16.48GB)
      tracked dirty bytes 372MB avg / 1.74GB peak modest, well under the 20% dirty trigger
      app-thread cache-miss reads 532K avg / 3.68M peak heavy read-back from disk
      app-thread eviction writes ~52s cumulative app threads pulled into eviction
      eviction server no-progress sleeps 677K peak eviction struggling
      write/read ticket queue depth 47 / 19 peak ops stalling behind tickets

      Mechanism: the cache fills with update-laden leaf pages (exactly what bytes_updates tracked). Clean eviction skips them (not clean); dirty eviction does not reclaim them (dirty stays under trigger). They crowd the cache -> less room for the working set -> cache misses + app-thread eviction -> latency spikes. Updates eviction is the relief valve for update-heavy workloads.

      Confound

      The patch also removed the updates dimension from mongo wiredtiger_cache_pressure_monitor (it read the deleted cache_bytes_updates stat). That monitor drives dynamic ticket admission (the pool was resized ~1466x in the run); with updates pressure invisible it under-throttles, plausibly amplifying tpce latency. Not fully isolable from FTDC alone.

      Conclusion

      Clean/dirty-only eviction is insufficient. Do not remove the updates dimension.

      Possible surgical follow-up (untested hypothesis)

      Keep the hard eviction_updates_trigger (relief valve) and bytes_updates accounting; remove only the soft eviction_updates_target (worker-thread pre-emptive level). This is mongo-clean (no cache-pressure-monitor change, so the confound disappears) and tests whether the worker pre-emptive target earns its keep, or whether the app-thread trigger alone suffices. Hypothesis only – needs its own patch + comparison; the read-side wins may shrink because the trigger still evicts update pages once they exceed it.

      Artifacts

      • WT patch (green): 6a1a9609
      • sys-perf patch: 6a1a9415
      • Perf comparison: analyzer
      • Branches: evict-remove-updates-target (wiredtiger), evict-remove-updates-target-perf (mongo, vendored against import commit 50c0f77)

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Haribabu Kommi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: