Set WT_SESSION_NO_DATA_HANDLES on default session immediately after __wti_connection_workers to catch unsafe dhandle access early

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Engines, Storage Engines - Foundations
    • 122.291
    • None
    • None

      Issue Summary

      WT_SESSION_NO_DATA_HANDLES is currently set on conn->default_session at the very end of wiredtiger_open (conn_api.c:3648), after the connection is fully ready. The flag should be set immediately after __wti_connection_workers returns, since that is the precise point at which background threads are live and session->dhandle is no longer safe to use from the default session.

      Moving the flag earlier turns a latent runtime crash into an immediate assertion failure in debug builds, giving developers a clear signal at the right moment rather than a subtle sanitizer-only crash in an unrelated code path.

      Context

      • The enforcement mechanism already exists: session_dhandle.c:932 asserts !F_ISSET(session, WT_SESSION_NO_DATA_HANDLES) on every dhandle acquisition. It just fires too late today because the flag is set too late.
      • WT-17362 is a concrete example of how this gap has bitten us: _conn_cleanup_chunk_cache (introduced in WT-17169) passed the default session to _wt_metadata_search after workers had started, causing a UBSAN null-pointer crash via S2BT(). The bug went unnoticed because the flag was not yet set at that point in wiredtiger_open.
      • All code between __wti_connection_workers (line 3621) and the current flag placement (line 3648) has been audited:
        • __conn_cleanup_chunk_cache — fixed in WT-17362 to use an internal session
        • _wt_open_internal_session + _wt_hs_verify — already uses its own session, unaffected
        • WT_STAT_CONN_SET — pure stats writes, no dhandle, unaffected

      Proposed Solution

      Move F_SET(session, WT_SESSION_NO_DATA_HANDLES) from line 3648 to immediately after line 3621 (__wti_connection_workers return), and update the comment to reflect the new placement:

          WT_ERR(__wti_connection_workers(session, cfg));
      
          /*
           * Background threads are now running. The default session's dhandle is no longer
           * safe to use: set the flag so any future dhandle access on this session asserts.
           */
          F_SET(session, WT_SESSION_NO_DATA_HANDLES);
      

      File to change: src/conn/conn_api.c only.

      Definition of Done

      • Flag is set immediately after __wti_connection_workers in wiredtiger_open
      • All existing tests pass
      • Any future code that mistakenly uses the default session for dhandle access after workers start will fail fast with an assertion in debug builds

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: