Major - P3
I was asking the question badly then, but I think there's a real problem.
I have a test case (attached) which does this:
- Create a single table, populate it with 1000 key/value pairs.
- Close and re-open the database, so we can fast-delete pages.
- Truncate a chunk of the key/value pairs inside a transaction.
- With the truncation uncommitted, checkpoint the database.
- Open the database
- Check that all the keys are there.
The output I get is:
What's going on is I'm truncating key/value pairs 290 to 500, and the first key on a new page boundary is key 294, so that page is fast-deleted, and after the crash, that's the first key we don't see.
The problem is not that we don't write the backing leaf page correctly (I think we do), but the internal page has a cell type of WT_CELL_DEL and when we read it, we assume all of the keys on the page have been deleted, which isn't correct because the deleting transaction never committed.
I think we need to fix the code in reconciliation to not write WT_CELL_DEL cells unless the delete is globally visible, but I'd need to stare at the code some more to be sure.
And, we need to think about named checkpoints, specifically in the page-reading code, where we recently made the change to short-circuit any deleted page or look-aside table handling if we're reading from a checkpoint handle – I think it's OK, but someone needs to review in the context of this ticket.
And, maybe there are logging implications?
Let me know if I'm just missing something!