@michaelcahill, I tried to clean up the last XXXKEITH flag, the fast-delete code.
I think it's mostly right, but there's at least one unsolved problem.
First, would you please review the *wt_tree_walk_delete_rollback, *tree_walk_delete and __wt_page_parent_modify_set functions? They're small, but the comments explain what's going on, and it would be good to make sure I'm not missing something in the solution.
Second, there's a problem if an instantiated page splits:
- thread X starts transaction,
- page gets marked WT_REF_DELETED by thread X's fast-truncate,
- page gets instantiated by thread Y, converting WT_REF_DELETED to key/value items that have WT_UPDATE structures marked "deleted",
- page is forcibly evicted and split into N different pages, the WT_UPDATE structures are saved/restored, but on multiple pages,
- thread X aborts its transaction, and is shocked to find that the WT_REF it's holding has changed to WT_REF_SPLIT.
I suppose we could flag the page so it can't split (this shouldn't be a common case, I can't imagine it's a performance problem to disallow forcible eviction of pages entirely deleted by a fast-truncate call), but I don't like it much, it feels ugly.
Another possible solution might be to update the rollback information as part of instantiating a fast-truncate page (probably by adding individual update references), so when the transaction rolls back the right thing happens. The obvious problem there is that a different thread/transaction instantiated the page from the thread/transaction that did the original delete. We could add updates to the current session/transaction pair, but that means we'd somehow have to figure out during rollback that the transaction we're rolling back has modifications in two different sessions, so that's not looking good, either.
Anyway, I'm hoping you have a better solution, let's talk when you have a few.