Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-188

Feature/eviction fixes

    • Type: Icon: Task Task
    • Resolution: Done
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Today I have been running test/format in threaded mode against the current tree to try to reproduce some failures that have been reported. The CONFIG file looks like this:

      bzip=0
      file_type=row
      delete_pct=0
      cache=3
      runs=100
      rows=100000
      ops=50000

      I'm running "./t -t 8" but the failures don't seem too dependent on the number of threads.

      The changes in this branch make this test run, but I am not happy with them: I'm publishing them in case the failures are blocking anyone.

      The failures I saw were related to a thread pulling an invalid page off the eviction list. With multiple evicting threads, a parent page could be chosen for eviction before its children that are also on the list. In that case, if the parent finished evicting its children, an invalid page pointer could be left on the list.

      Even after clearing child pages from the LRU queue, there were still cases where a page pointer was found on the LRU queue in a strange state (e.g., with a non-NULL parent but a NULL ref). I don't yet fully understand this, but holding the lru_lock while walking files appears to fix it. The problem is that I don't want to do that (which is why the WT_REF_EVICT_WALK state was introduced).

      Anyway, I'm not proposing that these changes be merged, only making them available here for any followup work on this bug.

            Assignee:
            Unassigned Unassigned
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: