-
Type:
Technical Debt
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Block Cache, Cache and Eviction
-
None
-
Storage Engines, Storage Engines - Persistence
-
0.001
-
SE Persistence backlog
-
None
Issue Summary
__wt_blkcache_compress reports its best-effort outcomes (no compressor configured, block too small, compression didn't help) through its output parameters and returns 0 in all those cases. A non-zero return is reserved for genuine failures: scratch-buffer allocation (OOM), bm->write_size, or the compressor extension callback itself erroring. Two callers treat that non-zero return inconsistently:
- __wt_blkcache_write (src/block_cache/block_io.c) uses WT_ERR and propagates the failure — correct for the durable write path.
- __evict_page_victim_cache (src/evict/evict_page.c) uses WT_IGNORE_RET, silently swallowing the error and then caching the uncompressed buffer.
The divergence reads as a smell, and more concretely the victim-cache path proceeds to push memory into the cache even right after an allocation failure.
Context
- __evict_page_victim_cache is a static void function, so it structurally cannot propagate via WT_RET/WT_ERR; the WT_IGNORE_RET mirrors the plh_cache_put call just below it.
- The victim cache is a best-effort optimization for disaggregated storage, so failing eviction on a cache-population error would be wrong.
- On every error path of __wt_blkcache_compress the output parameters stay consistent (compressed_bufp = NULL, compressedp = false), so bailing out is safe.
Proposed Solution
- Keep WT_ERR in __wt_blkcache_write.
- In __evict_page_victim_cache, capture the return value and abandon the caching attempt (early return) on a non-zero result instead of swallowing it and caching uncompressed after a likely OOM. Add a brief comment explaining the deliberate bail-out and why best-effort semantics are preserved. Or log a warning/error message and continue
Definition of Done
- The two call sites have consistent, intentional error handling.
- The victim cache no longer caches a page after a compression error unless we decide to go with printing a warning and ignoring the error code.
- Change compiles and passes existing eviction / block-cache tests.