This is a prerequisite to
An item I've had on my cleanup list for a long time is the fact the fast-delete code uses the flag WT_REF_READING as a general purpose short-term lock of the WT_REF structure, which isn't correct (it may be OK to use the flag in that way, but it shouldn't be named WT_REF_READING in that case). This is similar, if not the same, as how we use WT_REF_LOCKED.
I think we can go in a few different ways:
1. Merge WT_REF_LOCKED and WT_REF_READING into a new flag (maybe WT_REF_EXCLUSIVE, or WT_REF_PINNED), that's either a short-term or long-term lock of the WT_REF and which explicitly doesn't carry any additional information other than some thread of control has exclusive access to this WT_REF until the status changes. Then, eviction, fast-delete and page-read all use this state for their own purposes.
I think this approach works, but only because I don't see any place in the code that uses WT_REF_LOCKED as a state to mean "eviction owns this WT_REF". If such places exist, then this doesn't work unless we can get rid of them.
2. Rename WT_REF_LOCKED to be WT_REF_EVICTION (where it means eviction has the WT_REF locked down), then rename WT_REF_READING to be WT_REF_LOCKED (where WT_REF_LOCKED means a short-term lock of the WT_REF, used by fast-cell delete and page-read).
2(a) would be, I suppose, rename WT_REF_LOCKED to be WT_REF_EVICTION, add WT_REF_FAST_DELETE, and keep WT_REF_READING around, then we have 3 possible states, not 2. I don't see real value in having this additional state, myself.
3. Your good idea goes here.
Michael, Alex, I need some feedback on this one.
Here's where and how I think the WT_REF_LOCKED and WT_REF_READING flags are currently used:
- __debug_ref: print out information on the two flags
- __wt_evict_list_clr_page: eviction asserts a page is in the locked state
- __evict_get_page: eviction switches a page from WT_REF_MEM to WT_REF_LOCKED
- __wt_evict_lru_page: eviction asserts a page is in the locked state
- __wt_page_in_func: readers ignore WT_REF_LOCKED, WT_REF_READING pages
- __wt_cache_read: Readers switch a page from WT_REF_DISK or WT_REF_DELETED to WT_REF_READING
- __tree_walk_delete: Cursor delete switches a page from WT_REF_DISK to WT_REF_READING (short-term lock), then sets the page to WT_REF_DELETED.
- __tree_walk_read: Cursor read skips deleted pages, switches a page from WT_REF_DELETED to WT_REF_READING (short-term lock), so it can check if the deletion is visible.
- __wt_ref_evict: Eviction asserts the page is in WT_REF_LOCKED state
- __rec_discard_tree: Eviction asserts the page is in WT_REF_LOCKED state
- __rec_review: Eviction ignores pages already in the WT_REF_LOCKED or WT_REF_READING stateso
- __rec_excl_clear: Eviction asserts a page is locked when returning it to availability
- __hazard_exclusive: Eviction switches a page from WT_REF_MEM to WT_REF_LOCKED.
- __rec_page_modified: Reconciliation switches a page from WT_REF_DELETED to WT_REF_READING (short-term lock), waits for pages currently set to WT_REF_READING (can't distinguish between pages being read, or pages in fast-delete).
- include/btmem.h: definitions, the comments are wrong.