Skip ingest cursor open in leader mode for layered cursors

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cursors
    • None
    • Storage Engines - Foundations
    • 69.273
    • None
    • None

      Summary

      In leader mode the ingest table is empty for reads and is not used for writes: all reads and writes route to the stable table. Despite this, every layered cursor operation still opens, resets and closes the ingest cursor, paying the session cursor-cache / reopen bookkeeping and the dhandle rwlock cost on each call. In disaggregated storage (DSC) those bookkeeping steps translate into extra page-service round trips and measurably hurt read latency.

      This change skips opening the ingest cursor while the layered cursor is in leader mode, and adds NULL guards on the few code paths that previously assumed an ingest cursor always existed.

      Change

      In src/cursor/cur_layered.c:

      • __clayered_enter — only require stable_cursor for leader, only require ingest_cursor for follower when deciding whether to (re)open constituent cursors.
      • __clayered_open_cursors — early-return condition treats ingest as already-satisfied for leader; the ingest cursor is opened only when running as a follower.
      • _clayered_next / _clayered_prev cleanup — guard the ingest_cursor->reset call with a NULL check.
      • __clayered_largest_key — only call ingest_cursor->largest_key when the ingest cursor is open.
      • __clayered_next_random — return WT_NOTFOUND if the ingest fallback cursor is not open.

      The follower path is unchanged.

      Motivation / measurements

      Validated on the YCSB in-cache 100% read workload from BF-43331 (DSC 11-node, 5-minute FTDC window, per-query normalized):

      metric baseline DSC with this change delta
      paliRateLimiter admissions / query 1.50 1.20 -20%
      block-manager bytes read / query 61.8 41.7 -33%
      CPU user / query (µs) 37.8 36.2 -4%

      Read latency on the affected sys-perf task improved from 835 to 801 (target 776); throughput improved from 153,220 to 159,522 ops/s (target 164,818). The optimization closes roughly half of the DSC-vs-ASC regression on its own.

      Related

      Related to BF-43331 — [Atlas Infinite vs Atlas Core] Regression in YCSB in-cache and out-of-cache reads.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Shoufu Du
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: