Add block_cache_put_time_max statistic for worst-case victim cache put latency

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0, 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: Block Cache
    • None
    • Storage Engines, Storage Engines - Persistence
    • 228.341
    • SE Persistence backlog
    • None

      Issue Summary

      Add a block_cache_put_time_max connection statistic that records the maximum time spent adding a single page to the disaggregated victim cache (usecs). This complements the cumulative block_cache_put_time and the per-thread-role put counters added under WT-17850: the cumulative time gives the average per-put cost, but only a maximum surfaces the worst-case latency an operation pays on the eviction/victim-cache critical path. This is the tail signal needed to decide whether application threads should be allowed to contribute to the victim cache.

      This was prototyped in the WT-17850 PR but pulled out, because how we want the maximum to behave over time still needs to be defined (see Open question below).

      Context

      The victim-cache put happens in __evict_page_victim_cache (src/evict/evict_page.c), where we already measure the cumulative put time. The maximum can follow the existing eviction maximum-latency pattern, e.g. evict_max_ms / eviction_maximum_milliseconds:

      • A running maximum is held in a wt_shared uint64_t field on WT_EVICT (src/evict/evict.h), e.g. evict_victim_cache_max_put_us.
      • It is updated at the put site with __wt_atomic_stats_max_uint64(&conn->evict->evict_victim_cache_max_put_us, elapsed), where elapsed is the WT_CLOCKDIFF_US already computed for block_cache_put_time.
      • The field is copied into the statistic in _wt_evict_stats_init (src/evict/evict_conn.c) via WT_STATP_CONN_SET(session, stats, block_cache_put_time_max, ...). That function runs on every connection-stats read (called from _wt_conn_stat_init), so the statistic stays live - this is exactly how eviction_maximum_milliseconds is wired.
      • The stat is declared in dist/stat_data.py as a BlockCacheStat and the derived code regenerated with dist/stat.py.

      Open question - how the maximum behaves over time

      A plain lifetime maximum (no_clear, like evict_max_ms) only ever ratchets up, which is of limited use in FTDC. We likely want a maximum over a collection period. Two patterns exist in the codebase:

      • Per-checkpoint reset - mirror evict_max_ms_per_checkpoint, which is reset to 0 at a period boundary (see src/checkpoint/checkpoint_txn.c). Cheap, but tied to checkpoint cadence rather than the FTDC sampling interval.
      • Clear-on-read - make the statistic clearable and zero the backing WT_EVICT field when stats are read with statistics=(clear). This matches the FTDC sampling interval but needs extra wiring, since the generated clear path only zeroes the stat array, not the WT_EVICT field.

      Decide which semantics we want before implementing.

      Proposed Solution

      • Add the evict_victim_cache_max_put_us field to WT_EVICT and update it at the put site in __evict_page_victim_cache.
      • Refresh block_cache_put_time_max from it in __wt_evict_stats_init.
      • Implement the chosen reset semantics (per-collection-period vs lifetime) from the open question above.

      Definition of Done

      • block_cache_put_time_max is defined, populated, and validated by dist/s_all stat checks.
      • The reset behaviour is decided and implemented.
      • A short note on the chosen semantics is recorded on this ticket.

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: