Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8221

Compare write generation number before performing RTS

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.1, 5.2.0, 5.1.0-rc2, 5.0.5
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • 8
    • Storage - Ra 2021-10-18

      While working on WT-8074 I found that the modifications I was making to __wt_time_window_clear_obsolete caused testing fallout, I then went and tested the same on develop.

      If one comments out the call to __wt_time_window_clear_obsolete on develop it will result in the following tests failing reliably:

      • test_txn16
      • test_tiered03
      • test_import03
      • test_import04
      • test_import09

      __wt_time_window_clear_obsolete was originally implemented as an optimization and shouldn't have any significant impact on WiredTiger's behaviour. However given the numerous failures it's clear this is not true.

      I have spent some time chasing the failure on test_import03 and I think the following might be happening:

      1. The test imports a file into a new database and calls session->verify.
      2. The verify code requires exclusive access to the dhandle so it calls __wt_exclusive_handle_operation
      3. This then calls __checkpoint_tree which dirties the root node of the tree.
      4. As we're doing a closing checkpoint we call __wt_evict_file which reconciles the root node as it's been dirtied. It doesn't touch any child nodes here as they're clean.
      5. During the reconciliation of the root node we call __wt_cell_unpack_addr, which is followed by: __cell_unpack_window_cleanup, the latter call removes the value held in tw->newest_txn for the root node.
      6. Following that the root node's time-window is compared to it's child's, the child's time-window still has the newest_txn value that was cleared on the parent. As such the child's time-window is now newer than the parents. As its newest_txn is greater than it's parents.
      7. This fails the verification of the b-tree.

      I wrote a "fix" for the test_import03 failure to get a sense of whether I was looking in the right place, the following patch (which includes the remove of the time window clearing function in rec_visbility.c) should get test_import03 to pass (but not all failing tests will pass):

      diff --git a/src/evict/evict_file.c b/src/evict/evict_file.c
      index f6091a443..409d53758 100644
      --- a/src/evict/evict_file.c
      +++ b/src/evict/evict_file.c
      @@ -46,7 +46,10 @@ __wt_evict_file(WT_SESSION_IMPL *session, WT_CACHE_OP syncop)
           WT_ERR(__wt_tree_walk(session, &next_ref, walk_flags));
           while ((ref = next_ref) != NULL) {
               page = ref->page;
      +        if (syncop == WT_SYNC_CLOSE) {
      +            WT_ERR(__wt_page_modify_init(session, page));
      +            __wt_page_modify_set(session, page);
      +        }
                * Eviction can fail when a page in the evicted page's subtree switches state. For example,
                * if we don't evict a page marked empty, because we expect it to be merged into its parent,
      diff --git a/src/reconcile/rec_visibility.c b/src/reconcile/rec_visibility.c
      index 24b8e862d..6d17a1e38 100644
      --- a/src/reconcile/rec_visibility.c
      +++ b/src/reconcile/rec_visibility.c
      @@ -776,8 +776,8 @@ __wt_rec_upd_select(WT_SESSION_IMPL *session, WT_RECONCILE *r, WT_INSERT *ins, W
             !vpack->tw.prepare && (upd_saved || F_ISSET(vpack, WT_CELL_UNPACK_OVERFLOW)))
               WT_ERR(__rec_append_orig_value(session, page, upd_select->upd, vpack));
      -    __wt_time_window_clear_obsolete(
      -      session, &upd_select->tw, r->rec_start_oldest_id, r->rec_start_pinned_ts);
      +    //__wt_time_window_clear_obsolete(
      +    //  session, &upd_select->tw, r->rec_start_oldest_id, r->rec_start_pinned_ts);
           __wt_scr_free(session, &tmp);
           return (ret);

      The fix isn't a real fix as it dirties the child page manually to get the child's time window to be re-written. The actual issue may be more to do with write generations but I'm unsure at this point. It's worth noting that test_import03 is a column store test but I don't think this a column store specific issue.

      The issue I saw can be described as:

      1. We dirty the root page
      2. We clean it's time-window if it's dsk write_gen is older than the session write_gen.
      3. If the children of the root page are clean they won't be re-written and in theory will have their time-window's made invalid with respect to their parent's.

      The work on this ticket is to:

      • Implement the breaking change by commenting out the call to: __wt_time_window_clear_obsolete in rec_visibility.c
      • Investigate the test failures and suggest a solution.

            haribabu.kommi@mongodb.com Haribabu Kommi
            luke.pearson@mongodb.com Luke Pearson
            0 Vote for this issue
            9 Start watching this issue