-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
None
-
Storage Engines, Storage Engines - Persistence
-
249.796
-
SE Persistence backlog
-
None
Issue Summary
_evict_page_victim_cache (src/evict/evict_page.c) stores evicted clean leaf pages into the disaggregated victim cache. It is called from _evict_page_clean_update with no thread-type gating, so application threads, not just the eviction server and workers, run this code. We should decide whether it is acceptable for application threads to do this extra work, since they reach eviction precisely when the cache is already under pressure and they are on a latency-sensitive critical path.
Context
Two application-thread paths reach __evict_page_victim_cache, and neither sets WT_EVICT_CALL_CLOSING, so the victim-cache contribution is not skipped:
- App-assist eviction under cache pressure: _wt_evict_app_assist_worker_check -> wti_evict_app_assist_worker -> wti_evict_page(session, is_server=false) -> wt_evict -> evict_page_clean_update -> _evict_page_victim_cache. Triggered from txn commit/rollback, the page-read wait loop, and compaction.
- Forced/urgent eviction on page release: _wt_page_release_evict -> _wt_evict(... WT_EVICT_CALL_URGENT) -> same clean-update path.
Once past the early-return gates (disaggregated btree, cache-put available, clean non-root leaf with disk image, hot tier), the thread does the following inline while holding the page locked: full-page compression (_wt_blkcache_compress), a whole-buffer checksum (_wt_checksum), a page-header byteswap, and a plh_cache_put into the page-log layer (cost is implementation-defined and potentially the dominant term).
Trade-off: adding this work to an application thread extends a user-operation stall that is already happening under cache pressure; but the evicted page is hot and on the hot tier, so caching it on the way out avoids a likely-more-expensive re-fetch from the page-log layer later. There is currently no visibility into how much of the victim-cache fill comes from application threads.
Proposed Solution
Investigate and decide between the two options below. Note these are not mutually exclusive: option 3 can inform option 2.
Option 2 — Gate the victim-cache contribution by thread type. Make _evict_page_victim_cache (or its caller) skip the victim-cache put when the evicting thread is an application thread, at least for the interruptible app-assist path where the thread is least committed to the eviction. Forced/urgent eviction (_wt_page_release_evict) could still cache, since that page is definitely leaving the cache. The check would key off is_server / WT_SESSION_INTERNAL. Trade-off to evaluate: under heavy pressure application threads perform a large share of evictions, so gating them out could significantly reduce victim-cache coverage exactly when it matters most.
Option 3 — Add a statistic and measure first. Before changing any policy, add a connection statistic (e.g. block_cache_app_thread_puts) that counts victim-cache puts performed by application threads, so we can quantify what fraction of victim-cache fill is application-driven and what it costs. Use that data to decide whether option 2 (or an async plh_cache_put) is worth pursuing.
Suggested stats:
# Counters — frequency, split by role (alongside block_cache_cold_not_cached) BlockCacheStat('block_cache_app_thread_puts', 'pages added to the disaggregated victim cache during eviction by application threads'), BlockCacheStat('block_cache_eviction_thread_puts', 'pages added to the disaggregated victim cache during eviction by eviction threads'), # Cost — time application threads spend on the victim-cache put path EvictStat('eviction_app_victim_cache_time', 'time (usecs) application threads spent putting pages into the victim cache', 'no_clear'),
Definition of Done
- A recommendation documented on this ticket: keep current behavior, gate by thread type (option 2), and/or add measurement (option 3).
- If option 3 is chosen as the first step, the stat is added and wired through codegen.
- If option 2 is chosen, the gating is implemented with the interruptible vs forced-eviction distinction considered, and the coverage impact is assessed.
- has to be done before
-
WT-17833 Do not let application threads insert pages into the disaggregated victim/block cache
-
- Open
-
- related to
-
WT-17832 Add victim-cache compression latency histogram stat
-
- Closed
-
-
WT-17854 Add block_cache_put_time_max statistic for worst-case victim cache put latency
-
- Closed
-
- split from
-
WT-17801 Identify statistics to track block cache contention
-
- Closed
-