Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: WT12.0.0, 9.0.0-rc0
Affects Version/s: None
Component/s: DHandles
Labels:
None

Assigned Teams:

Storage Engines - Persistence
Total Hours with Assigned Team:
1,137.756
Sprint:
SE Persistence backlog
Story Points:
None

Problem

On a disaggregated storage secondary, the dhandle_lock_blocked (thread-yield / "data handle lock yielded") counter reaches 25.6 million events over a 15-minute mixed-workload run. This is the dominant source of the 29–38 ms average command latency observed on the secondary, well above the ~685 µs average read latency.

Root Cause

__disagg_apply_checkpoint_meta ([src/conn/conn_layered.c|src/conn/conn_layered.c]) calls __wti_conn_dhandle_outdated() for every table on every checkpoint pickup — twice per table (once for the old checkpoint dhandle, once for the live-btree dhandle). This marks all matching dhandles WT_DHANDLE_OUTDATED while holding the schema lock.

After the schema lock is released, the next access by any session finds its cached dhandle marked OUTDATED in __session_find_dhandle ([src/session/session_dhandle.c:139-143|src/session/session_dhandle.c]), discards it, and attempts to reopen. Reopening a not-yet-open dhandle requires an exclusive write lock (__wt_session_lock_dhandle). With ~231 concurrent reader sessions all invalidated simultaneously for all tables, they all race to acquire the same exclusive write locks. Each failed attempt spins through the yield loop at [src/session/session_dhandle.c:280-285|src/session/session_dhandle.c] and increments dhandle_lock_blocked.

With 30 checkpoints applied in 15 minutes (~1 per 16 s), the stampede recurs on every checkpoint across all tables and all concurrent sessions, accumulating to 25.6M yield events.

Evidence (FTDC, 2026-06-02 disagg mixed-workload run)

Metric	Value
`dhandle_lock_blocked` (secondary)	25,621,687 events
Checkpoints applied from primary	30 (~1 every 16 s)
`apply checkpoint metadata most recent time`	200 ms (last snapshot)
Average command latency (secondary)	29,606–38,443 µs
Concurrent connections (secondary)	~231
`checkpoint lock application thread wait time`	70.4 s cumulative (separate issue)

Impact

Command latency on the secondary is 40–55× the raw read latency. Every MongoDB driver heartbeat, cursor getMore, and aggregate command is affected. The contention recurs on each checkpoint pickup regardless of read workload size.

Suggested Direction

Instead of invalidating all dhandles globally on every checkpoint pickup, consider:

Tracking which dhandles actually changed (by checkpoint name diff) and only marking those outdated, rather than marking the live-btree dhandle for every table unconditionally (the TODO comment at [conn\_layered.c:508-513|src/conn/conn_layered.c] already notes this should be done at step-up/step-down).
Coordinating re-opens so that only one thread per dhandle does the open and others wait on a condition rather than spinning.

related to

WT-17814 Improve re-opening dhandles given many concurrent readers

Backlog

Assignee:: Peter Macko
Reporter:: Chenhao Qu
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 04 2026 11:11:30 PM UTC
Updated:: Jun 17 2026 08:26:02 AM UTC
Resolved:: Jun 15 2026 04:51:40 AM UTC

Details

Description

Problem

Root Cause

Evidence (FTDC, 2026-06-02 disagg mixed-workload run)

Impact

Suggested Direction

Attachments

Issue Links

Activity

People

Dates