Investigate and improve prefetch statistics

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Minor - P4
    • WT12.0.0, 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: Prefetch
    • None
    • Storage Engines - Persistence
    • 258.309
    • SE Persistence - 2026-05-08
    • None

      WiredTiger has a number of prefetch statistics that are redundant and make diagnostics harder to interpret. Specifically, prefetch_skipped ("pre-fetch not triggered by page read") is incremented alongside a more specific stat at every call site, making the aggregate counter carry no additional information.

      Context

      In src/session/session_prefetch.c, every early-return path increments both a reason-specific stat and the generic prefetch_skipped:

      if (F_ISSET(session, WT_SESSION_INTERNAL)) {
          WT_STAT_CONN_INCR(session, prefetch_skipped_internal_session);
          WT_STAT_CONN_INCR(session, prefetch_skipped);
          return (false);
      }
      
      if (F_ISSET(ref, WT_REF_FLAG_INTERNAL)) {
          WT_STAT_CONN_INCR(session, prefetch_skipped_internal_page);
          WT_STAT_CONN_INCR(session, prefetch_skipped);
          return (false);
      }
      
      if (F_ISSET(S2BT(session), WT_BTREE_SPECIAL_FLAGS) &&
        !F_ISSET(S2BT(session), WT_BTREE_VERIFY)) {
          WT_STAT_CONN_INCR(session, prefetch_skipped_special_handle);
          WT_STAT_CONN_INCR(session, prefetch_skipped);
          return (false);
      }
      
      if (session->pf.prefetch_disk_read_count < 2) {
          WT_STAT_CONN_INCR(session, prefetch_skipped_disk_read_count);
          WT_STAT_CONN_INCR(session, prefetch_skipped);
          return (false);
      }
      

      Because prefetch_skipped == sum of all per-reason skip stats, it provides no diagnostic value beyond what summing the specific counters would give. This pattern likely extends to other prefetch aggregate stats.

      • Redundant stats make dashboards and diagnostic scripts harder to interpret.
      • Operators and developers must cross-reference multiple counters to understand prefetch skip rates by cause.
      • Removing or restructuring the duplicate counter would simplify analysis without loss of information.

      Proposed Solution

      1. Audit all prefetch stats in src/stat/stat_data.py and their call sites to identify which stats are always co-incremented with a more specific companion.
      2. Determine whether prefetch_skipped (or similar aggregate stats) adds value as a roll-up, or whether it can be removed in favour of summing the per-reason counters in tooling/monitoring.
      3. Propose and implement a cleaner stat design: either remove the redundant aggregate, rename stats for clarity, or add documentation strings that make the stat relationships explicit.
      4. Update any downstream tooling or dashboards that rely on removed/renamed stats.

      Definition of Done

      • All redundant prefetch stats are either removed or documented with clear semantics.
      • No stat is silently co-incremented with another without a clear reason.
      • The change is reflected in stat_data.py and all relevant call sites.

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: