-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cursors
-
None
-
Storage Engines - Foundations
-
69.273
-
None
-
None
Summary
In leader mode the ingest table is empty for reads and is not used for writes: all reads and writes route to the stable table. Despite this, every layered cursor operation still opens, resets and closes the ingest cursor, paying the session cursor-cache / reopen bookkeeping and the dhandle rwlock cost on each call. In disaggregated storage (DSC) those bookkeeping steps translate into extra page-service round trips and measurably hurt read latency.
This change skips opening the ingest cursor while the layered cursor is in leader mode, and adds NULL guards on the few code paths that previously assumed an ingest cursor always existed.
Change
In src/cursor/cur_layered.c:
- __clayered_enter — only require stable_cursor for leader, only require ingest_cursor for follower when deciding whether to (re)open constituent cursors.
- __clayered_open_cursors — early-return condition treats ingest as already-satisfied for leader; the ingest cursor is opened only when running as a follower.
- _clayered_next / _clayered_prev cleanup — guard the ingest_cursor->reset call with a NULL check.
- __clayered_largest_key — only call ingest_cursor->largest_key when the ingest cursor is open.
- __clayered_next_random — return WT_NOTFOUND if the ingest fallback cursor is not open.
The follower path is unchanged.
Motivation / measurements
Validated on the YCSB in-cache 100% read workload from BF-43331 (DSC 11-node, 5-minute FTDC window, per-query normalized):
| metric | baseline DSC | with this change | delta |
|---|---|---|---|
| paliRateLimiter admissions / query | 1.50 | 1.20 | -20% |
| block-manager bytes read / query | 61.8 | 41.7 | -33% |
| CPU user / query (µs) | 37.8 | 36.2 | -4% |
Read latency on the affected sys-perf task improved from 835 to 801 (target 776); throughput improved from 153,220 to 159,522 ops/s (target 164,818). The optimization closes roughly half of the DSC-vs-ASC regression on its own.
Related
Related to BF-43331 — [Atlas Infinite vs Atlas Core] Regression in YCSB in-cache and out-of-cache reads.