-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Layered Tables
-
None
-
Storage Engines, Storage Engines - Foundations
-
979.81
-
None
-
None
Problem
There is a race condition between __wti_layered_iterate_ingest_tables_for_gc_pruning and concurrent collection drops that causes a crash in __wt_session_release_dhandle_v2.
When a concurrent drop happens before __layered_update_ingest_table_prune_timestamp is called, __wt_session_get_dhandle returns ENOENT (missing metadata). Because this call is wrapped with WT_ERR_NOTFOUND_OK (instead of WT_RET_NOTFOUND_OK), execution falls into the error path and attempts to release the dhandle even though it was never acquired. This causes an invalid memory access / SIGSEGV in __wt_session_release_dhandle_v2.
The crash sequence:
1. Checkpoint fires and Disagg thread begins walking all layered dhandles via __wti_layered_iterate_ingest_tables_for_gc_pruning
2. The same checkpoint advances checkpointOldestTimestamp past the pending drop timestamp for a collection, unblocking deferred ident drops
3. Concurrent ReplWriterWorker threads drop the idents — WiredTiger dhandles freed
4. Disagg thread crashes in __wt_session_release_dhandle_v2 dereferencing a freed dhandle
Reproduced in WT using a Python test with sleeps to encourage the race. Also observed in production (AF-17292) and in CI pali_chaos.js (BF-43413).
Fix
Three issues to address in src/conn/conn_layered_ingest.c (__layered_update_ingest_table_prune_timestamp):
- Replace WT_ERR_NOTFOUND_OK with WT_RET_NOTFOUND_OK around __wt_session_get_dhandle, so a missing table causes an early return instead of falling into the error/cleanup path.
- __wt_session_get_dhandle never returns WT_NOTFOUND — it converts missing metadata to ENOENT. Update the check accordingly.
- Wrap the function with the schema lock to prevent concurrent drops while GC pruning walks the dhandle list, keeping the fix simple and safe.
References
- Production crash: AF-17292
- CI failure: BF-43413
- Crash location: src/conn/conn_layered_ingest.c — __layered_update_ingest_table_prune_timestamp (~line 862)