Fix race condition between GC pruning walk and concurrent drops in layered ingest

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Foundations
    • 979.81
    • None
    • None

      Problem

      There is a race condition between __wti_layered_iterate_ingest_tables_for_gc_pruning and concurrent collection drops that causes a crash in __wt_session_release_dhandle_v2.

      When a concurrent drop happens before __layered_update_ingest_table_prune_timestamp is called, __wt_session_get_dhandle returns ENOENT (missing metadata). Because this call is wrapped with WT_ERR_NOTFOUND_OK (instead of WT_RET_NOTFOUND_OK), execution falls into the error path and attempts to release the dhandle even though it was never acquired. This causes an invalid memory access / SIGSEGV in __wt_session_release_dhandle_v2.

      The crash sequence:
      1. Checkpoint fires and Disagg thread begins walking all layered dhandles via __wti_layered_iterate_ingest_tables_for_gc_pruning
      2. The same checkpoint advances checkpointOldestTimestamp past the pending drop timestamp for a collection, unblocking deferred ident drops
      3. Concurrent ReplWriterWorker threads drop the idents — WiredTiger dhandles freed
      4. Disagg thread crashes in __wt_session_release_dhandle_v2 dereferencing a freed dhandle

      Reproduced in WT using a Python test with sleeps to encourage the race. Also observed in production (AF-17292) and in CI pali_chaos.js (BF-43413).

      Fix

      Three issues to address in src/conn/conn_layered_ingest.c (__layered_update_ingest_table_prune_timestamp):

      1. Replace WT_ERR_NOTFOUND_OK with WT_RET_NOTFOUND_OK around __wt_session_get_dhandle, so a missing table causes an early return instead of falling into the error/cleanup path.
      2. __wt_session_get_dhandle never returns WT_NOTFOUND — it converts missing metadata to ENOENT. Update the check accordingly.
      3. Wrap the function with the schema lock to prevent concurrent drops while GC pruning walks the dhandle list, keeping the fix simple and safe.

      References

      • Production crash: AF-17292
      • CI failure: BF-43413
      • Crash location: src/conn/conn_layered_ingest.c__layered_update_ingest_table_prune_timestamp (~line 862)

            Assignee:
            Sid Mahajan
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: