Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Cache and Eviction
Labels:
None

Assigned Teams:

Storage Engines, Storage Engines - Transactions
Total Hours with Assigned Team:
51.034
Sprint:
None
Story Points:
None

Motivation

On large caches (18 GB+) under high update pressure, the eviction walker is
the candidate-discovery bottleneck: it samples the btree hoping to find the
oldest dirty pages before pressure reaches the trigger, and on fast dirty
generation it cannot keep up. The WT-17234 investigation demonstrated this
across eight tuning attempts -- deeper sampling, walker persistence,
dominance gating, cache-fill gates -- every variant that helped one workload
class hurt another. FTDC on the failing approaches showed the walker doing
1.83x the work of baseline while delivering 32% lower insert throughput.

This ticket proposes and prototypes a structurally different approach: drive
dirty-candidate discovery from the write path instead of from walker
sampling. The walker stops being the throughput-limiting component; it only
has to pop entries the producer has already identified.

Design

Producer (modify path)

Every successful cursor modify records the dirty leaf ref into a per-btree
ring. Sized proportionally to cache (500 slots per GB, clamped between 4096
and 262144). Insert is a trylock + atomic head advance, so producer
contention degrades gracefully: a missed insert is a performance hint
missed, not a correctness issue. The walker remains a safety net.

Consumer (eviction walker)

The walker drains the ring at the start of each per-btree visit. Each ref is
guarded by a hazard pointer (*wt_hazard_set) before dereferencing
ref->page, then handed to *evict_try_queue_page -- the same gate
the tree walker uses. Stale refs (page freed, split in progress, already
queued, etc.) are skipped. Whatever the drain does not fill, the tree walker
handles as before.

Observability (FTDC)

Six new CacheStat entries for rate and safety metrics:
* cache_eviction_dirty_index_insert -- producer rate
* cache_eviction_dirty_index_insert_contended -- trylock drops
* cache_eviction_dirty_index_overwrite -- ring wrap-around (producer
faster than consumer)
* cache_eviction_dirty_index_scanned -- drain slots examined
* cache_eviction_dirty_index_hit -- refs successfully queued
* cache_eviction_dirty_index_stale -- slots filtered as stale

Current prototype status

The prototype is on the wt-dirty-index branch, rebased directly on
develop (not piggybacked on the WT-17234 eviction_queue_scale branch
-- a separate, clean baseline).

Producer + drain code is fully implemented. Drain is gated behind a
compile-time WTI_DIRTY_INDEX_DRAIN_ENABLED flag, default 0. Format
stress surfaced key-order corruption when the drain was enabled
unconditionally: hazard pointers alone do not coordinate with concurrent
page splits that mutate key ordering. Getting the consumer safety right
requires deeper integration with __wt_tree_walk_count semantics
(follow-up work on this ticket).

With the drain disabled the producer still fires, letting us measure the
candidate production rate and producer overhead without risking data
corruption.

What this ticket is for

Drive the push-model approach to completion on its own branch.
Make the drain safe to enable in production (hazard + split-coordination).
Measure -- and if positive, ship -- the end-to-end win on YCSB load and
large-cache / high-dirty workloads where the walker alone cannot keep up.

WT-17234 -- the eight-approach investigation that established the walker
itself is the bottleneck at scale. This ticket carries the lessons forward
into a different class of fix.
WT-15538 -- umbrella ticket for slow eviction under high update ratio.
WT-16529 -- queue usage / empty-queue investigation (pull-side tuning,
complementary).
WT-16665 -- dynamic queue resize (pull-side tuning, complementary).

Open questions / follow-ups

Consumer safety under splits. Current drain races with split completion on
key-order; the drain must either coordinate with the split lock or filter
split-in-progress refs. Needs investigation; likely the correct answer is
to check ref->home generation or similar split-safe identifier.
Whether the ring should also admit dirty-side reads that trigger
WT_PAGE_EVICT_LRU_URGENT (currently only cursor modify paths feed it).
Interaction with disaggregated storage eviction constraints
(materialization frontier, PALI page-server admission). The drain path
re-uses __evict_try_queue_page so those gates should already fire,
but needs explicit sys-perf testing.

Branch

Primary branch: wt-dirty-index (rebased on develop).

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Haribabu Kommi
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Apr 19 2026 04:19:09 AM UTC
Updated:: Apr 21 2026 03:24:34 PM UTC
Resolved:: Apr 21 2026 03:24:35 PM UTC

Push-model dirty-page index for eviction candidate discovery

Motivation

Design

Current prototype status

What this ticket is for

Related

Open questions / follow-ups

Branch

Details

Description

Motivation

Design

Current prototype status

What this ticket is for

Related

Open questions / follow-ups

Branch

Attachments

Activity

People

Dates