-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Btree, Compaction
-
None
-
Environment:GCC 13, Ubuntu 24.04 with diagnostic check and ASan, LSan enabled
-
Storage Engines - Persistence
-
26.701
-
SE Persistence backlog
-
1
During database compaction, _compact_page_replace_addr is responsible for replacing a page's WT_ADDR structure and when the page address is off-page (i.e., _wt_off_page(ref->home, addr) evaluates to true), the function updates the address structure in-place.
The memory management order of operations in the original code is this:
1. It schedules the old block cookie to be freed via __wti_ref_addr_safe_free
2. It sets addr->block_cookie = NULL;
3. It calls __wt_strndup(...) to duplicate the new block cookie.
4. If __wt_strndup fails under memory pressure and returns ENOMEM, the function jumps to the err: label.
5. Since addr == old_addr, the struct itself is not freed, and the function propagates the ENOMEM error up.
Impact:
At this point, the compaction operation is aborted, but the live tree still holds ref->addr pointing to old_addr, whose block_cookie has been permanently set to NULL and any subsequent reader, writer, or eviction thread that attempts to access ref->addr->block_cookie will dereference a NULL pointer, causing a database crash or undefined behavior because the pointer to the original block cookie was set to NULL, the engine loses the reference to that heap allocation, causing a memory leak (caught by LeakSanitizer/LSan). (Also have a repro for this)
Proposed Fix:
Putting an the new block cookie should be allocated first into a temporary pointer (new_cookie) before freeing the old cookie or modifying addr->block_cookie. If the allocation fails, the function returns ENOMEM early, leaving the original WT_ADDR and its block_cookie completely untouched and valid.