Do not let application threads insert pages into the disaggregated victim/block cache

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Block Cache
    • None

      Issue Summary

      On disaggregated-storage btrees, the victim/block cache is populated on the eviction path: when a clean leaf page is evicted, __evict_page_victim_cache() compresses the page, checksums it, byteswaps the header, and inserts it via plh_cache_put(). This caching work runs on whatever thread is performing the eviction — including application threads doing foreground eviction under cache pressure. When that happens, the compress + checksum + put cost lands directly on the latency-critical path of the user operation that triggered the eviction.

      We should not let application threads add pages to the victim cache. Only background eviction workers should populate it. This bounds the block cache's impact on user-operation tail latency without removing its benefit, since background eviction performs the bulk of eviction work anyway.

      Context

      * Observed on a find_one_and_update_locust workload (test Aggregated) comparing a baseline build vs a block-cache-enabled build (9.0 alpha, PALI 2-node replica set). FTDC analysis showed: avg findAndModify (write) latency +9.7%, the server-side write-latency histogram tail shifting right (the Latency95thPercentile regression), getPage wait p99 +11%, and log-append send p99 up ~2.5x.
      * The block cache hit rate on this workload was ~0.26% (643 hits / 248,717 requests), so the inserts were almost pure overhead — but that is a separate admission-policy problem. This ticket is scoped to the orthogonal fix of keeping inserts off the foreground/app-thread path.
      * CPU usage actually dropped while latency rose, indicating the regression is added blocking/waiting (eviction-time cache insert), not extra compute on the workload itself.
      * Affected path: src/evict/evict_page.c*evict_page_victim_cache(), called from *evict_page_clean_update(). The function runs on the generic eviction path, which is exercised by both background eviction workers and application (foreground) eviction.

      Proposed Solution

      * In *evict_page_victim_cache() (or at its call site in *evict_page_clean_update()), detect whether the current session/thread is an application thread performing foreground eviction vs a background eviction worker.
      * If it is an application thread, skip the victim-cache insert entirely (just discard the page as in the no-block-cache path) — do not compress, checksum, or call plh_cache_put().
      * Only background eviction workers should populate the victim cache.
      * Optionally add a statistic counting inserts skipped due to foreground eviction, to quantify how often this path is hit.

      Definition of Done

      * Application threads never perform victim-cache inserts (verified by stat and/or test).
      * Background eviction workers continue to populate the cache as before.
      * Re-run the find_one_and_update_locust / Aggregated comparison and confirm the Latency95thPercentile regression is reduced.

            Assignee:
            Unassigned
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: