Default session used after background threads start in wiredtiger_open causes null dhandle crash in __conn_cleanup_chunk_cache

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Tiered Storage
    • None
    • Storage Engines - Persistence
    • 123.591
    • SE Persistence - 2026-05-08
    • None

      Issue Summary

      *conn_cleanup_chunk_cache (introduced in WT-17169) calls *wt_metadata_search on conn->default_session after __wti_connection_workers has returned. At that point background threads (eviction, handle sweep, log manager, checkpoint, etc.) are already running and can modify session->dhandle concurrently. This leaves the session's cached metadata cursor in an invalid state, causing a UBSAN null pointer dereference crash when S2BT(session) dereferences a NULL session->dhandle.

      src/btree/bt\_read.c:500:13: runtime error: member access within null pointer of type 'WT\_DATA\_HANDLE'
      
      #0  \_\_wt\_page\_in\_func          src/btree/bt\_read.c:500
      #1  \_\_wt\_page\_swap\_func        src/include/btree\_inline.h:2646
      #2  \_\_wt\_row\_search            src/btree/row\_srch.c:576
      #3  \_\_cursor\_row\_search        src/btree/bt\_cursor.c:513
      #4  \_\_wt\_btcur\_search          src/btree/bt\_cursor.c:765
      #5  \_\_curfile\_search           src/cursor/cur\_file.c:314
      #6  \_\_wt\_metadata\_search       src/meta/meta\_table.c:340
      #7  \_\_conn\_cleanup\_chunk\_cache src/conn/conn\_api.c:1558
      

      Context

      • Affected path: src/conn/conn_api.c, function wiredtiger_open
      • Root cause: The invariant is that conn->default_session->dhandle is unsafe to use after *wti_connection_workers returns, because background threads are now running. This is already encoded in the codebase: immediately after wiredtiger_open finishes, WT_SESSION_NO_DATA_HANDLES is set on the default session (conn_api.c:3648) and enforced by an assertion in session_dhandle.c:932. The bug is that *conn_cleanup_chunk_cache is called at line 3624 — after workers start (line 3621) but before this flag is set.
      • Crash mechanism: CURSOR_API_CALL in __curfile_search does session->dhandle = cbt->dhandle. After background threads invalidate the cursor's dhandle, this sets session->dhandle to NULL. S2BT(session) then dereferences it.
      • Trigger: Observed during FCV downgrade in cleanShutdown, which triggers a second wiredtiger_open. Introduced by WT-17169.
      • Pattern: The immediately following verify_session block at conn_api.c:3631 correctly uses *wt_open_internal_session for post-workers metadata access — *conn_cleanup_chunk_cache should follow the same pattern.

      Proposed Solution

      Open a fresh internal session for the chunk cache cleanup instead of reusing the default session:

      /\* conn\_api.c, after line 3621 \*/
      WT\_SESSION\_IMPL \*cleanup\_session;
      WT\_ERR\(\_\_wt\_open\_internal\_session\(conn, "cleanup chunk cache", false, 0, 0, &cleanup\_session\)\);
      ret = *conn\_cleanup\_chunk\_cache\(cleanup\_session\);
      WT\_TRET\(*wt\_session\_close\_internal\(cleanup\_session\)\);
      WT\_ERR\(ret\);
      

      Files to change: src/conn/conn_api.c (call site only; __conn_cleanup_chunk_cache itself is fine).

      Definition of Done

      • *conn_cleanup_chunk_cache no longer uses the default session after *wti_connection_workers returns
      • UBSAN sanitizer build no longer crashes on the repro path (FCV downgrade / second wiredtiger_open)
      • BF-43009 is resolved

            Assignee:
            Jasmine Bi
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: