Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9720

Clean up and reorganize the fast-truncate code

    • Storage Engines - 2022-08-22, Storage Engines - 2022-09-05

      There are a number of things about the fast-truncate code that are currently messy, as the result of first it being neglected for a while and then (this past winter and spring) underestimating what was necessary to get it running again, then needing to be more concerned with getting it running than tidying it.

      These issues are now behind us. I have gone through and cleaned out a bunch of things that I'd been taking notes on earlier.

      Major or more notable changes:

      • Move ref->ft_info.updates to the page-modify structure, which always exists when it's needed. And instead of moving ref->ft_info.del (the page-deleted structure) to the modify structure at instantiation time, leave it where it is. This means we no longer need a union in the ref, so change ref->ft_info.del to be just ref->page_del. This in particular makes everything a lot simpler.
      • In eviction, if we evict an instantiated page, set its state back to WT_REF_DELETED. This is now straightforward to do (earlier on it was a horrible mess) and this means all the states and baggage associated with deleted pages actually being in WT_REF_DISK state (which was only possible for readonly trees so not normally even reachable and thus hard to test) can be (and I believe has been) G/C'd.
      • Skip reading and instantiating deleted pages if the deletion is globally visible. Instead, make a new empty page. This requires pretending that we loaded that empty page from the original on-disk page image, which is potentially problematic. This change got rejected earlier on because it seemed likely to break. There were in fact some ways it would have broken then.
      • Set page_del->committed when committing a truncation affecting an already-instantiated page. (This was already correct for prepare and abort.) I believe this to be the fix for WT-9713; I saw the problem in this branch and fixed it, but figured it had only become an issue because of the other cleanup.
      • Don't mark instantiated pages dirty by default. It is in principle unnecessary (because instantiation should be an identity operation) and with other changes in this cleanup it becomes workable. In particular, the eviction improvement above coupled with keeping the page_del structure in the ref after instantiation makes it work; we can discard clean instantiated pages and they turn back into on-disk deleted pages. Note that VLCS pages will still end up dirty because they are instantiated via col_modify; however, row-store pages won't.
      • Add code in RTS to not skip over clean instantiated pages. (Previously in rollback_page_needs_abort, all instantiated pages were dirty so the timestamps didn't matter; that's no longer true and since the deletion info isn't reflected in the time aggregate it now has to be checked.)
      • Kill off the last uses of wt_page_del_active in favor of wt_page_del_visible.
      • Split wt_page_del_visible into wt_page_del_visible and wt_page_del_visible_all. I had hoped this would make one of them simpler (that didn't pan out) and the resultant two functions are nearly identical, so this may be controversial. But I think it's good to keep the visible and visible_all paths separate and explicit at the call sites.
      • Make the special-case behavior of wt_page_del_visible/visible_all regarding prepared truncations optional. It turns out there is only one place that uses this option.
      • Note: this changes the third argument of wt_page_del_visible from meaning "actually visible_all" (prior to these changes) to meaning "make prepared truncations invisible" (after these changes). This is not as dangerous as it sounds because it happened as distinct commits separated by testing.
      • Change all the direct calls to wt_txn_visible/visible_all on page_del info to calls to wt_page_del_visible/visible_all. It turns out that all of these (there were three, though one was added in this changeset) should have been (and weren't) using the special-case behavior for prepared truncates.
      • Add two new tests that test two of these cases. The third case (in wt_btcur_skip_page) turns out to be inaccessible. It seems that the pages under the first and last rows of a truncate are always slow-truncated, regardless of where the page boundaries are, and this makes it impossible to get cursor_next/prev to step to a deleted page when the deletion is prepared; they will always step to a slow-truncated item first, see the prepared tombstone, and fail on that. The other two (wt_delete_page_skip and the new code to skip reading pages with a globally visible truncation) are by no means inaccessible (though not readily or all that easily accessible except on purpose) and the new tests variously segfault and assert without the fix.
      • Simplify the logic for the visibility checks that are made when we're possibly going to check both private and global visibility. There are two cases of this and they now look more or less the same.

            alexander.gorrod@mongodb.com Alexander Gorrod
            dholland+wt@sauclovia.org David Holland
            0 Vote for this issue
            5 Start watching this issue