-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Block Cache
-
None
-
Storage Engines, Storage Engines - Persistence
-
228.341
-
SE Persistence backlog
-
None
Issue Summary
Add a block_cache_put_time_max connection statistic that records the maximum time spent adding a single page to the disaggregated victim cache (usecs). This complements the cumulative block_cache_put_time and the per-thread-role put counters added under WT-17850: the cumulative time gives the average per-put cost, but only a maximum surfaces the worst-case latency an operation pays on the eviction/victim-cache critical path. This is the tail signal needed to decide whether application threads should be allowed to contribute to the victim cache.
This was prototyped in the WT-17850 PR but pulled out, because how we want the maximum to behave over time still needs to be defined (see Open question below).
Context
The victim-cache put happens in __evict_page_victim_cache (src/evict/evict_page.c), where we already measure the cumulative put time. The maximum can follow the existing eviction maximum-latency pattern, e.g. evict_max_ms / eviction_maximum_milliseconds:
- A running maximum is held in a wt_shared uint64_t field on WT_EVICT (src/evict/evict.h), e.g. evict_victim_cache_max_put_us.
- It is updated at the put site with __wt_atomic_stats_max_uint64(&conn->evict->evict_victim_cache_max_put_us, elapsed), where elapsed is the WT_CLOCKDIFF_US already computed for block_cache_put_time.
- The field is copied into the statistic in _wt_evict_stats_init (src/evict/evict_conn.c) via WT_STATP_CONN_SET(session, stats, block_cache_put_time_max, ...). That function runs on every connection-stats read (called from _wt_conn_stat_init), so the statistic stays live - this is exactly how eviction_maximum_milliseconds is wired.
- The stat is declared in dist/stat_data.py as a BlockCacheStat and the derived code regenerated with dist/stat.py.
Open question - how the maximum behaves over time
A plain lifetime maximum (no_clear, like evict_max_ms) only ever ratchets up, which is of limited use in FTDC. We likely want a maximum over a collection period. Two patterns exist in the codebase:
- Per-checkpoint reset - mirror evict_max_ms_per_checkpoint, which is reset to 0 at a period boundary (see src/checkpoint/checkpoint_txn.c). Cheap, but tied to checkpoint cadence rather than the FTDC sampling interval.
- Clear-on-read - make the statistic clearable and zero the backing WT_EVICT field when stats are read with statistics=(clear). This matches the FTDC sampling interval but needs extra wiring, since the generated clear path only zeroes the stat array, not the WT_EVICT field.
Decide which semantics we want before implementing.
Proposed Solution
- Add the evict_victim_cache_max_put_us field to WT_EVICT and update it at the put site in __evict_page_victim_cache.
- Refresh block_cache_put_time_max from it in __wt_evict_stats_init.
- Implement the chosen reset semantics (per-collection-period vs lifetime) from the open question above.
Definition of Done
- block_cache_put_time_max is defined, populated, and validated by dist/s_all stat checks.
- The reset behaviour is decided and implemented.
- A short note on the chosen semantics is recorded on this ticket.