-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Tiered Storage
-
None
-
Storage Engines - Persistence
-
123.591
-
SE Persistence - 2026-05-08
-
None
Issue Summary
*conn_cleanup_chunk_cache (introduced in WT-17169) calls *wt_metadata_search on conn->default_session after __wti_connection_workers has returned. At that point background threads (eviction, handle sweep, log manager, checkpoint, etc.) are already running and can modify session->dhandle concurrently. This leaves the session's cached metadata cursor in an invalid state, causing a UBSAN null pointer dereference crash when S2BT(session) dereferences a NULL session->dhandle.
src/btree/bt\_read.c:500:13: runtime error: member access within null pointer of type 'WT\_DATA\_HANDLE' #0 \_\_wt\_page\_in\_func src/btree/bt\_read.c:500 #1 \_\_wt\_page\_swap\_func src/include/btree\_inline.h:2646 #2 \_\_wt\_row\_search src/btree/row\_srch.c:576 #3 \_\_cursor\_row\_search src/btree/bt\_cursor.c:513 #4 \_\_wt\_btcur\_search src/btree/bt\_cursor.c:765 #5 \_\_curfile\_search src/cursor/cur\_file.c:314 #6 \_\_wt\_metadata\_search src/meta/meta\_table.c:340 #7 \_\_conn\_cleanup\_chunk\_cache src/conn/conn\_api.c:1558
Context
- Affected path: src/conn/conn_api.c, function wiredtiger_open
- Root cause: The invariant is that conn->default_session->dhandle is unsafe to use after *wti_connection_workers returns, because background threads are now running. This is already encoded in the codebase: immediately after wiredtiger_open finishes, WT_SESSION_NO_DATA_HANDLES is set on the default session (conn_api.c:3648) and enforced by an assertion in session_dhandle.c:932. The bug is that *conn_cleanup_chunk_cache is called at line 3624 — after workers start (line 3621) but before this flag is set.
- Crash mechanism: CURSOR_API_CALL in __curfile_search does session->dhandle = cbt->dhandle. After background threads invalidate the cursor's dhandle, this sets session->dhandle to NULL. S2BT(session) then dereferences it.
- Trigger: Observed during FCV downgrade in cleanShutdown, which triggers a second wiredtiger_open. Introduced by WT-17169.
- Pattern: The immediately following verify_session block at conn_api.c:3631 correctly uses *wt_open_internal_session for post-workers metadata access — *conn_cleanup_chunk_cache should follow the same pattern.
Proposed Solution
Open a fresh internal session for the chunk cache cleanup instead of reusing the default session:
/\* conn\_api.c, after line 3621 \*/ WT\_SESSION\_IMPL \*cleanup\_session; WT\_ERR\(\_\_wt\_open\_internal\_session\(conn, "cleanup chunk cache", false, 0, 0, &cleanup\_session\)\); ret = *conn\_cleanup\_chunk\_cache\(cleanup\_session\); WT\_TRET\(*wt\_session\_close\_internal\(cleanup\_session\)\); WT\_ERR\(ret\);
Files to change: src/conn/conn_api.c (call site only; __conn_cleanup_chunk_cache itself is fine).
Definition of Done
- *conn_cleanup_chunk_cache no longer uses the default session after *wti_connection_workers returns
- UBSAN sanitizer build no longer crashes on the repro path (FCV downgrade / second wiredtiger_open)
- BF-43009 is resolved
- related to
-
WT-17363 Set WT_SESSION_NO_DATA_HANDLES on default session immediately after __wti_connection_workers to catch unsafe dhandle access early
-
- Open
-