I’ve been working backwards from checkpoint skipping a page it shouldn’t when running the test case in
WT-7958. Here is what I am seeing:
- page P exists on disk with address A and is clean
- checkpoint starts running
- page P is modified, setting first_dirty_txn ahead of the checkpoint
- eviction chooses P to evict (in some tree ahead of the checkpoint)
- eviction reconciles P
- the main part of reconciliation succeeds but __rec_hs_wrapup fails with EBUSY (there are various checks in __wt_hs_insert_updates when checkpoint_running == true, I’m not sure exactly which one is failing)
- at this point, ref->addr == NULL && mod->rec_result == 0 and the block for A has been freed, the page is dirty but first_dirty_txn is ahead of the checkpoint
- checkpoint skips writing P, and when it writes P’s parent, it considers P, sees the missing address and takes the WT_CHILD_IGNORE path — i.e., nothing is written and the original content of P (from step 1) is missing from the checkpoint
Note that nothing is lost in memory, so the next checkpoint (including a clean shutdown) will write P and fill in the hole.
It looks like reordering __rec_write_wrapup to call __rec_hs_wrapup before it clears out the address will fix this, I’m just checking if there are any problems with doing that.