Evaluate defaulting eviction_updates_trigger to 95%

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cache and Eviction
    • None
    • Storage Engines - Transactions
    • 163.472
    • SE Transactions - 2026-07-17
    • 3

      Summary

      WT-15538 recommends raising eviction_updates_trigger (with success reported "as high as 95%") as the remediation for the high-update-ratio eviction problem. This ticket evaluates making 95% the default across the sys-perf suite.

      Verdict (FTDC-verified): the only causal effect of the change on sys-perf is a tail-latency regression on insert-bound workloads (bulk_insert). Every other workload is unaffected - the update-eviction path never engages because updates-in-cache stays far below the trigger. sys-perf does not reproduce the WT-15538 regime, so it cannot demonstrate the benefit; it only surfaces the downside. 95% should not ship as a global default on the strength of sys-perf.

      Change tested

      eviction_updates_trigger default 0 (auto = half of eviction_dirty_trigger, i.e. ~10% of cache) changed to 95.

      Result: one causal effect

      Workload Metric Change Z Causal?
      bulk_insert_w1 95th (InsertMany / Aggregated) +13.0% 2.18 yes - real regression

      No other workload showed a change attributable to this knob (see FTDC below). The replicated comparison did surface several read/mixed rows at |z| > 2 (linkbench2, mixed_workloads, find_one_and_update, ycsb in_cache), but FTDC proves the trigger never engages in those workloads, so they are not effects of this change and are excluded as results.

      FTDC verification (cache = 19.33 GB; old trigger ~1.93 GB / 10%, new ~18.4 GB / 95%)

      The trigger only acts once updates-in-cache crosses the threshold. Per-workload:

      Workload (phase) updates-in-cache (max) % of cache Crosses old 10% trigger?
      bulk_insert_w1 (load) 6.19 GB 32% yes (~3x over)
      bulk_insert_w1 (load_with_indexes) 3.78 GB 20% yes
      linkbench2 (request_test) 0.99 GB 5.1% no
      mixed_workloads 0.53 GB 2.8% no
      find_one_and_update_embedded 0.62 GB 3.2% no
      ycsb in_cache 95read5update 0.28 GB 1.4% no
      ycsb out_of_cache 95read5update 0.61 GB 3.2% no

      Only bulk_insert pushes updates-in-cache past the old 10% trigger. There the old default recruited application threads to evict update content early; raising the trigger to 95% disables that, so update content accumulates to 3-6 GB and is flushed in bursts via dirty/general eviction -> +13% 95th-percentile spikes. This is a genuine, mechanism-backed regression.

      For every other workload updates stay at 1.4-5.1% of cache, so the update-eviction path is inactive in both configurations and the change is a no-op. The application-thread eviction seen in linkbench request_test (160k requests) is driven by dirty hitting the 20% dirty trigger, which this change does not touch.

      Why the non-bulk_insert rows are not effects of this change

      Those rows are 4 self-consistent patch runs (low CoV) compared against the historical stable region, not a fresh baseline (only 2 of 141 rows had a direct base value). Since FTDC shows the change cannot affect these workloads, the few-percent offset from the historical band is attributable to base-commit / infra / drift over the stable window (Apr-May), not to the trigger change.

      An earlier single-run comparison also flagged a large ycsb out_of_cache read regression (z=9.49); that was a 3-point warmup-phase artifact and did not survive replication. Same root cause: comparison against a stable region with no direct baseline.

      Recommendation

      • Do not ship 95% as a global default based on sys-perf: the only causal effect observed is the bulk_insert tail regression.
      • sys-perf is the wrong test bed - no workload reproduces the WT-15538 regime (>= 72 GB cache, sustained 2000+ updates/s, update ratio > dirty ratio). Validate the benefit on a workload that actually drives updates-in-cache past the trigger (or a HELP-ticket repro), where the high-trigger remediation is known to help.
      • If a default change is still desired, evaluate an intermediate value and measure specifically on update-heavy workloads; weigh any gain against the bulk_insert tail-latency cost.

      Method note

      Base side of the replicated comparison is the historical stable region (2/141 rows had a fresh direct baseline). Patch-side CoV across the 4 runs is tight (1-7%) and stable-region sample sizes are healthy (n=24-79), so the measurements are precise - but precision against a historical band is not the same as a causal A/B, which is why FTDC was needed to separate real effects from drift.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Haribabu Kommi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: