-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
None
-
Storage Engines - Transactions
-
207.359
-
None
-
None
Background
With shared disk image support, memory_footprint now includes both:
- page-private memory (extra page size)
- shared disk image size
This is mostly correct for eviction and cache accounting because evicting a page with ref_count == 1 frees both the page-private memory and the shared disk image memory.
However, memory_footprint is also used in many other paths, and we should audit all usages to verify whether including shared disk image size is semantically correct in each case.
Current Findings
The following are known usages of memory_footprint:
1. Cache accounting
- total cache stats accounting
- __wt_evict_page_cache_bytes_decr
- __wt_cache_inmem_incr
- etc.
2. Per-page size stats / verbose
- __evict_page_clean_or_dirty_size_max
- tracks largest page size seen at eviction
- __evict_try_queue_page
- verbose logging for urgent queue eviction
- max_pagesize in __evict_stat_walk
- cache_state_root_size in __wt_evict_cache_stat_walk
3. Eviction decisions
- __evict_priority
if (__wt_atomic_load_size_relaxed(&page->memory_footprint) > btree->splitmempage) return (WT_READGEN_EVICT_SOON);
- __evict_push_candidate
evict_entry->score += WT_MEGABYTE - WT_MIN(WT_MEGABYTE, __wt_atomic_load_size_relaxed(&ref->page->memory_footprint));
- __evict_try_queue_page
if (modified && (__wt_atomic_load_uint64_relaxed(&page->read_gen) == WT_READGEN_EVICT_SOON || __wt_atomic_load_size_relaxed(&page->memory_footprint) >= btree->splitmempage)) { /* push to urgent queue */ }
- __evict_walk_tree
if (__wt_ref_is_root(ref) || evict_entry == start || give_up || __wt_atomic_load_size_relaxed(&ref->page->memory_footprint) >= btree->splitmempage) { ... }
Stops tree walk if the current page is considered large.
4. Tree walk diagnostic dump
- intl_bytes / leaf_bytes
- used in verbose logging in __verbose_dump_cache_single
- leaf_bytes / internal_bytes
- used in __wt_sync_file verbose output
5. In-memory split decisions
- __wt_leaf_page_can_split
if (__wt_atomic_load_size_relaxed(&page->memory_footprint) < btree->splitmempage) return (false);
if (__wt_atomic_load_size_relaxed(&page->memory_footprint) > (size_t)btree->maxleafpage * 2) { ... } - __split_internal_should_split
if (__wt_atomic_load_size_relaxed(&page->memory_footprint) > btree->maxmempage)
Known Issue
Tree walk diagnostic dump paths can over-report memory usage when many pages share the same disk image.
The accurate fix would likely require:
- counting only page-private memory in per-page accounting
- separately walking the shared disk hash table to aggregate shared disk image memory per tree
This is currently only a reporting/diagnostic issue and does not affect cache accounting correctness.
Current Assessment
The current eviction and split behavior appears logically correct.
Including shared disk image size in memory_footprint is likely the right behavior for eviction heuristics because pages with ref_count == 1 free the shared disk memory upon eviction.
However, we should audit all memory_footprint usages and determine whether:
- the current semantics are correct
- some paths should use page-private memory only
- some statistics/reporting paths require adjusted accounting
Future work may also explore whether eviction heuristics should treat shared disk image size differently for performance tuning.
Definition of Done
- Audit all memory_footprint usages.
- Categorize usages by:
- cache accounting
- eviction heuristics
- split heuristics
- statistics/reporting
- diagnostics/logging
- Verify whether shared disk image size should be included in each usage.
- Document any incorrect or ambiguous semantics.
- Create follow-up tickets for required behavioral or reporting changes.