Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7737

internal page eviction triggers diagnostic assert with WT_DHANDLE_EXCLUSIVE

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      The function wt_page_can_evict() has this check:

      /*  
       * If a split created new internal pages, those newly created internal pages cannot be evicted
       * until all threads are known to have exited the original parent page's index, because evicting
       * an internal page discards its WT_REF array, and a thread traversing the original parent page
       * index might see a freed WT_REF.
       *   
       * One special case where we know this is safe is if the handle is locked exclusive (e.g., when
       * the whole tree is being evicted). In that case, no readers can be looking at an old index.
       */  
      if (F_ISSET(ref, WT_REF_FLAG_INTERNAL) && !F_ISSET(session->dhandle, WT_DHANDLE_EXCLUSIVE) &&
        __wt_gen_active(session, WT_GEN_SPLIT, page->pg_intl_split_gen))
          return (false);
      

      And the function wt_evict() has this check:

      /* Check that we are not about to evict an internal page with an active split generation. */
      if (F_ISSET(ref, WT_REF_FLAG_INTERNAL) && !closing)
          WT_ASSERT(session, !__wt_gen_active(session, WT_GEN_SPLIT, page->pg_intl_split_gen));
      

      As you can see, they’re not quite the same: we can decide a page can be evicted based on WT_DHANDLE_EXCLUSIVE and then assert during the actual eviction if we're not closing the object.

      (Note the comment in wt_page_can_evict() implies that WT_DHANDLE_EXCLUSIVE is connected to full-tree eviction, so closing is presumably true. While I think that’s correct in a global sense, but it’s not necessarily true RIGHT NOW, it can only be guaranteed to mean we will discard the tree before another session gets access. If that cache is small or stressed enough, we can reasonably evict internal pages before the close.)

      I triggered the wt_evict() assert by changing rollback-to-stable to acquire its handles with WT_DHANDLE_EXCLUSIVE as part of the changes to allow salvage to call RTS, it seemed like a bad idea to do RTS with other open handles on an object. That triggered this race because we pass the check in wt_page_can_evict(), but assert in wt_evict() because we’re evicting an internal page that looks like it’s got a dangerous split-generation.

      I plan to fix this by checking for WT_DHANDLE_EXCLUSIVE in wt_evict(). I think this means we’re saying “if you have the handle exclusively, there’s no point in doing split-generations at all, nobody else can get to the pindex structures”. That sounds correct to me, and that fix includes removing the RTS code that tracks split-generations, and reviewing the other uses of an exclusive handle that is doing page-generation tracking, with possible changes to remove that work.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: