-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Cache and Eviction
-
Storage Engines
-
5
-
StorEng - 2024-11-12, StorEng - 2024-12-24, StorEng - 2025-01-21
Related ticket HELP-66173.
We currently have insufficient stats to understand why a page has not been evicted for a long time. There are three potential causes for this in a pages lifetime:
- It isn't found by the server
- The server couldn't add it to the queue
- A worker couldn't evict the queued page
This ticket adds stats to track each of these events. The proposed stats are:
Max difference between page's pass_gen and global pass_gen
This lets us know how when a page wasn't found by the server for a long time. Rather than track per page it can be tracked similar to read_gen_oldest and reported{}
Max failed queue attempts for a page
This is tracked per-page and reported at time of eviction. It lets us know if the eviction server repeatedly found a page but failed to add it to the eviction queue.
Max failed evictions for a page
This is tracked per-page and reported at time of eviction. It lets us know if the eviction server successfully queued the page and the issue came at time of eviction.
These stats should be limited to leaf pages. Internal pages will throw off the results as they're very rarely evicted due to the presence of child pages
The first stat can be recorded in a similar manner to WT_EVICT::read_gen_oldest and doesn't need a new field on the page. The other two stats would require adding fields to WT_PAGE. I'm reluctant to add fields to WT_PAGE unnecessarily so these stats should be reviewed carefully and perf tests (WiredTiger and Mongo) run to confirm we're not causing slow downs by touching potentially hot shared memory.