Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: WT12.0.0, 8.3.0-rc0
Affects Version/s: None
Component/s: Cache and Eviction, Layered Tables
Labels:
None

Assigned Teams:

Storage Engines, Storage Engines - Transactions
Sprint:
SE Transactions - 2025-10-10
Story Points:
3

During internal testing of disaggregated storage, we have hit several issues where a standby node running in follower mode has stalled due to cache pressure.

The typical scenario is that oplog application inserts/updates records that land in the ingest tables on the follower node. Because the follower can't write to shared storage, this dirty data has to remain in cache until it can either be pruned (after picking up a new checkpoint) or until it can be written into the shared table (after stepping up to become the primary/leader).

But we currently use the same cache eviction targets and triggers in follower mode and leader mode, despite the fact that we can't evict dirty data from a follower. This means that inserting records equal to 10% of the cache (the default update trigger) will cause a follower to stall – all of the oplog applier threads get pulled in to help with eviction, but we can't evict so they get stuck, and the node effectively hangs.

This ticket is intended as a short term fix to enable more standby testing. For now, we should not try to evict dirty or update content on a follower node. This risks filling (or overfilling) the cache with dirty data. So we should also add a failure mode where we panic if the cache is full of dirty data – better to have a clear failure with a clear cause than to have the system mysteriously hang.

Definition of done:

Application threads are not used to help with dirty or update eviction when WiredTiger is in follower mode.
- This could be implemented by dynamically adjusting the trigger/target values when the system switches between follower and leader modes. or by changing the checks for using application threads for eviction, or something else.
There are no changes to clean eviction behavior. WT should still evict clean pages if the cache is full.
If the cache has a large amount of dirty or update content (95%?) WT should log a clear message about the problem and panic.

Note: the long term fix here isn't obvious – there are a variety of ways we could relieve pressure on the follower – but they also have downstream consequences for things like failover time and the efficiency of ingest draining during checkpoint pickup. Hence this ticket to allow more testing while we consider more holistic solution.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot 2025-09-29 at 4.35.28 pm.png
172 kB
Sep 29 2025 06:36:14 AM UTC
Screenshot 2025-09-30 at 9.01.04 am.png
104 kB
Sep 29 2025 11:09:23 PM UTC

is related to

WT-15192 Incorrect comparison between local table and metadata checkpoint orders during pruning

Closed

WT-15608 Aggregated timestamp validation can fail with a 0 timestamped page deleted structure

Closed

WT-15596 Don't review obsolete time window for readonly btree

Closed

WT-15616 dist/s_all pass with failed s_test_suite_no_executable check

Closed

WT-15626 Fix RTS verbose time window output

Closed

WT-15622 Skip eviction walk on read only btrees if we are only look for dirty and updates data

Closed

WT-10374 test_fops dirty leaf page count went negative

Closed

related to

WT-15041 Handle abandoned checkpoints in PALM

Closed

WT-15534 Enable timestamp usage check for fast truncate on non-standalone build

Closed

WT-15634 failed: s-outdated-fixmes on infrequent-checks [wiredtiger @ fbae136d]

Closed

(2 is related to, 3 related to)

Assignee:: Chenhao Qu
Reporter:: Keith Smith
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 26 2025 09:07:24 PM UTC
Updated:: Oct 14 2025 10:43:00 PM UTC
Resolved:: Sep 30 2025 05:59:42 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates