-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
None
-
Storage Engines - Transactions
-
163.472
-
SE Transactions - 2026-07-17
-
3
Summary
WT-15538 recommends raising eviction_updates_trigger (with success reported "as high as 95%") as the remediation for the high-update-ratio eviction problem. This ticket evaluates making 95% the default across the sys-perf suite.
Verdict (FTDC-verified): the only causal effect of the change on sys-perf is a tail-latency regression on insert-bound workloads (bulk_insert). Every other workload is unaffected - the update-eviction path never engages because updates-in-cache stays far below the trigger. sys-perf does not reproduce the WT-15538 regime, so it cannot demonstrate the benefit; it only surfaces the downside. 95% should not ship as a global default on the strength of sys-perf.
Change tested
eviction_updates_trigger default 0 (auto = half of eviction_dirty_trigger, i.e. ~10% of cache) changed to 95.
- Patch: sys-perf patch
- Replicated comparison (3-clone managed multipatch, 4 patch runs/metric): Performance Analyzer
Result: one causal effect
| Workload | Metric | Change | Z | Causal? |
|---|---|---|---|---|
| bulk_insert_w1 | 95th (InsertMany / Aggregated) | +13.0% | 2.18 | yes - real regression |
No other workload showed a change attributable to this knob (see FTDC below). The replicated comparison did surface several read/mixed rows at |z| > 2 (linkbench2, mixed_workloads, find_one_and_update, ycsb in_cache), but FTDC proves the trigger never engages in those workloads, so they are not effects of this change and are excluded as results.
FTDC verification (cache = 19.33 GB; old trigger ~1.93 GB / 10%, new ~18.4 GB / 95%)
The trigger only acts once updates-in-cache crosses the threshold. Per-workload:
| Workload (phase) | updates-in-cache (max) | % of cache | Crosses old 10% trigger? |
|---|---|---|---|
| bulk_insert_w1 (load) | 6.19 GB | 32% | yes (~3x over) |
| bulk_insert_w1 (load_with_indexes) | 3.78 GB | 20% | yes |
| linkbench2 (request_test) | 0.99 GB | 5.1% | no |
| mixed_workloads | 0.53 GB | 2.8% | no |
| find_one_and_update_embedded | 0.62 GB | 3.2% | no |
| ycsb in_cache 95read5update | 0.28 GB | 1.4% | no |
| ycsb out_of_cache 95read5update | 0.61 GB | 3.2% | no |
Only bulk_insert pushes updates-in-cache past the old 10% trigger. There the old default recruited application threads to evict update content early; raising the trigger to 95% disables that, so update content accumulates to 3-6 GB and is flushed in bursts via dirty/general eviction -> +13% 95th-percentile spikes. This is a genuine, mechanism-backed regression.
For every other workload updates stay at 1.4-5.1% of cache, so the update-eviction path is inactive in both configurations and the change is a no-op. The application-thread eviction seen in linkbench request_test (160k requests) is driven by dirty hitting the 20% dirty trigger, which this change does not touch.
Why the non-bulk_insert rows are not effects of this change
Those rows are 4 self-consistent patch runs (low CoV) compared against the historical stable region, not a fresh baseline (only 2 of 141 rows had a direct base value). Since FTDC shows the change cannot affect these workloads, the few-percent offset from the historical band is attributable to base-commit / infra / drift over the stable window (Apr-May), not to the trigger change.
An earlier single-run comparison also flagged a large ycsb out_of_cache read regression (z=9.49); that was a 3-point warmup-phase artifact and did not survive replication. Same root cause: comparison against a stable region with no direct baseline.
Recommendation
- Do not ship 95% as a global default based on sys-perf: the only causal effect observed is the bulk_insert tail regression.
- sys-perf is the wrong test bed - no workload reproduces the WT-15538 regime (>= 72 GB cache, sustained 2000+ updates/s, update ratio > dirty ratio). Validate the benefit on a workload that actually drives updates-in-cache past the trigger (or a HELP-ticket repro), where the high-trigger remediation is known to help.
- If a default change is still desired, evaluate an intermediate value and measure specifically on update-heavy workloads; weigh any gain against the bulk_insert tail-latency cost.
Method note
Base side of the replicated comparison is the historical stable region (2/141 rows had a fresh direct baseline). Patch-side CoV across the 4 runs is tight (1-7%) and stable-region sample sizes are healthy (n=24-79), so the measurements are precise - but precision against a historical band is not the same as a causal A/B, which is why FTDC was needed to separate real effects from drift.