-
Type:
Improvement
-
Resolution: Duplicate
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
None
-
Storage Engines, Storage Engines - Transactions
-
51.034
-
None
-
None
Motivation
On large caches (18 GB+) under high update pressure, the eviction walker is
the candidate-discovery bottleneck: it samples the btree hoping to find the
oldest dirty pages before pressure reaches the trigger, and on fast dirty
generation it cannot keep up. The WT-17234 investigation demonstrated this
across eight tuning attempts -- deeper sampling, walker persistence,
dominance gating, cache-fill gates -- every variant that helped one workload
class hurt another. FTDC on the failing approaches showed the walker doing
1.83x the work of baseline while delivering 32% lower insert throughput.
This ticket proposes and prototypes a structurally different approach: drive
dirty-candidate discovery from the write path instead of from walker
sampling. The walker stops being the throughput-limiting component; it only
has to pop entries the producer has already identified.
Design
Producer (modify path)
Every successful cursor modify records the dirty leaf ref into a per-btree
ring. Sized proportionally to cache (500 slots per GB, clamped between 4096
and 262144). Insert is a trylock + atomic head advance, so producer
contention degrades gracefully: a missed insert is a performance hint
missed, not a correctness issue. The walker remains a safety net.
Consumer (eviction walker)
The walker drains the ring at the start of each per-btree visit. Each ref is
guarded by a hazard pointer (*wt_hazard_set) before dereferencing
ref->page, then handed to *evict_try_queue_page -- the same gate
the tree walker uses. Stale refs (page freed, split in progress, already
queued, etc.) are skipped. Whatever the drain does not fill, the tree walker
handles as before.
Observability (FTDC)
Six new CacheStat entries for rate and safety metrics:
* cache_eviction_dirty_index_insert -- producer rate
* cache_eviction_dirty_index_insert_contended -- trylock drops
* cache_eviction_dirty_index_overwrite -- ring wrap-around (producer
faster than consumer)
* cache_eviction_dirty_index_scanned -- drain slots examined
* cache_eviction_dirty_index_hit -- refs successfully queued
* cache_eviction_dirty_index_stale -- slots filtered as stale
Current prototype status
The prototype is on the wt-dirty-index branch, rebased directly on
develop (not piggybacked on the WT-17234 eviction_queue_scale branch
-- a separate, clean baseline).
Producer + drain code is fully implemented. Drain is gated behind a
compile-time WTI_DIRTY_INDEX_DRAIN_ENABLED flag, default 0. Format
stress surfaced key-order corruption when the drain was enabled
unconditionally: hazard pointers alone do not coordinate with concurrent
page splits that mutate key ordering. Getting the consumer safety right
requires deeper integration with __wt_tree_walk_count semantics
(follow-up work on this ticket).
With the drain disabled the producer still fires, letting us measure the
candidate production rate and producer overhead without risking data
corruption.
What this ticket is for
- Drive the push-model approach to completion on its own branch.
- Make the drain safe to enable in production (hazard + split-coordination).
- Measure -- and if positive, ship -- the end-to-end win on YCSB load and
large-cache / high-dirty workloads where the walker alone cannot keep up.
Related
- WT-17234 -- the eight-approach investigation that established the walker
itself is the bottleneck at scale. This ticket carries the lessons forward
into a different class of fix. - WT-15538 -- umbrella ticket for slow eviction under high update ratio.
- WT-16529 -- queue usage / empty-queue investigation (pull-side tuning,
complementary). - WT-16665 -- dynamic queue resize (pull-side tuning, complementary).
Open questions / follow-ups
- Consumer safety under splits. Current drain races with split completion on
key-order; the drain must either coordinate with the split lock or filter
split-in-progress refs. Needs investigation; likely the correct answer is
to check ref->home generation or similar split-safe identifier. - Whether the ring should also admit dirty-side reads that trigger
WT_PAGE_EVICT_LRU_URGENT (currently only cursor modify paths feed it). - Interaction with disaggregated storage eviction constraints
(materialization frontier, PALI page-server admission). The drain path
re-uses __evict_try_queue_page so those gates should already fire,
but needs explicit sys-perf testing.
Branch
Primary branch: wt-dirty-index (rebased on develop).