-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Storage Engines - Transactions
-
166.049
-
None
-
None
Question
Can WiredTiger eviction key only on clean/dirty content, dropping the "updates" target/trigger dimension entirely? Conclusion: No. Full removal is net-negative; updates eviction is load-bearing for update-heavy workloads.
Experiment
Removed the entire updates eviction dimension (branch evict-remove-updates-target):
- config eviction_updates_target / eviction_updates_trigger
- flags WT_EVICT_CACHE_UPDATES / WT_EVICT_CACHE_UPDATES_HARD
- bytes_updates cache/btree/page accounting and related statistics
- mongo mirror: stop emitting eviction_updates_trigger into wiredtiger_open; drop the updates dimension from wiredtiger_cache_pressure_monitor
WT pull_request patch is green (required relaxing the test_truncate19 oplog-size bound 600MB -> 650MB: the change raises that transient peak ~575MB -> ~601MB because truncated pages pinned by the long-running txn linger in cache longer).
Results (sys-perf vs master 7a072ec391d)
64 significant movers (|z| > 2): 47 regressions, 17 improvements.
Regressions (worse):
- tpce_locust latency +40% to +75% (avg and p95), throughput -1% to -3%
- update-heavy YCSB: in_cache 95read5update update latency +45%, ops -6%
- ecommerce_locust p50 +7% to +16%; find_one_and_update +9% to +12%
Improvements (read-resident workloads):
- linkbench2 latency -6% to -11%; in_cache YCSB read latency -15.8%; bulk_insert p50 -26%; tsbs bulk load +4.6%
FTDC root-cause (tpce_locust primary, 49-min run)
| metric | value | reading |
|---|---|---|
| bytes in cache | ~11GB avg / 16.5GB peak | near-full, leaf-dominated (16.48GB) |
| tracked dirty bytes | 372MB avg / 1.74GB peak | modest, well under the 20% dirty trigger |
| app-thread cache-miss reads | 532K avg / 3.68M peak | heavy read-back from disk |
| app-thread eviction writes | ~52s cumulative | app threads pulled into eviction |
| eviction server no-progress sleeps | 677K peak | eviction struggling |
| write/read ticket queue depth | 47 / 19 peak | ops stalling behind tickets |
Mechanism: the cache fills with update-laden leaf pages (exactly what bytes_updates tracked). Clean eviction skips them (not clean); dirty eviction does not reclaim them (dirty stays under trigger). They crowd the cache -> less room for the working set -> cache misses + app-thread eviction -> latency spikes. Updates eviction is the relief valve for update-heavy workloads.
Confound
The patch also removed the updates dimension from mongo wiredtiger_cache_pressure_monitor (it read the deleted cache_bytes_updates stat). That monitor drives dynamic ticket admission (the pool was resized ~1466x in the run); with updates pressure invisible it under-throttles, plausibly amplifying tpce latency. Not fully isolable from FTDC alone.
Conclusion
Clean/dirty-only eviction is insufficient. Do not remove the updates dimension.
Possible surgical follow-up (untested hypothesis)
Keep the hard eviction_updates_trigger (relief valve) and bytes_updates accounting; remove only the soft eviction_updates_target (worker-thread pre-emptive level). This is mongo-clean (no cache-pressure-monitor change, so the confound disappears) and tests whether the worker pre-emptive target earns its keep, or whether the app-thread trigger alone suffices. Hypothesis only – needs its own patch + comparison; the read-side wins may shrink because the trigger still evicts update pages once they exceed it.
Artifacts
- related to
-
WT-15538 Investigate slow eviction behavior when updates ratio is high
-
- Open
-