Investigate slow eviction behavior when updates ratio is high

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cache and Eviction
    • Storage Engines - Transactions
    • 4,127.458
    • SE Transactions - 2026-01-16, SE Transactions - 2026-01-30, SE Transactions - 2026-02-13
    • None

      This serves as an umbrella ticket for investigations and other work to address an eviction problem that we have seen in several 7.x and 8.x systems. 

      The common theme in these incidents is seeing the WT cache exceed the eviction_updates_trigger, causing WT to use server threads entering WT to help with eviction, and degrading performance. Despite using server threads for eviction, the cache remains at or above the eviction_updates_trigger for an extended period, indicating that WT is struggling the find and evict update content from the cache. 

      Other common characteristics include:

      • Update ratio that exceeds the dirty ratio (sometimes by a lot), indicating that a lot of the update content is on clean pages
      • Systems that have already increased the eviction_updates_trigger but are now hitting the increased trigger value.
      • Behavior only seen on one or two replicas. Other nodes appear healthy
      • Very large WT caches, All instances have occured on systems with >= 72 GB of cache.
      • Workload includes 2,000+ updates/second
      • 100 - 500 active dhandles, implying that some dhandles have many pages in the cache
      • The systems are trying to evict update content (WT_EVICT_CACHE_UPDATES is set)
      • Mostly the systems are not evicting unmodified pages, except when evict_target is hit and the system looks for clean pages to evict. So setting WT_EVICT_CACHE_UPDATES is not finding (or choosing to evict) the clean pages with updates.

      Remediation

      We do not yet understand the root cause of this behavior. Until we have a fix, the most effective way to address this is to increase the eviction_updates_trigger. until the problem goes away. Typically at a higher trigger values WiredTiger's normal eviction finds and evicts update content without needing to pull in server threads.

      We have seen success with update triggers as high as 95%, but recommend raising it incrementally and observing the effect to be on the safe side.

      There are two potential downsides to using large values for eviction_updates_trigger.

      1. High amounts of update content in the WT cache can lead to greater memory fragmentation, increasing the server's memory footprint. On systems with limited memory, this increases the risk of OOM kills. But since this problem occurs on systems with large memory this is less of a concern. You can monitor the system memory available to track the amount of memory available for allocation in the system.
      2. Having more updates in the WT cache reduces the number of pages WT can cache, increasing the cache miss rate and degrading performance. While this is undesirable, our experience is that this is a substantially smaller performance penalty than that caused when the system exceeds the eviction_updates_trigger

      Please note that setting legacy_page_visit_strategy=true is NOT recommended for addressing this issue.

      I have seen this recommended on a number of HELP tickets related to this problem, but have not seen any evidence that it was helpful. I have also not seen a cogent theory for why it should be helpful.

      I asked Glean to examine tickets where we tried setting this option to address WT-15538, and it agreed that:

      Clear “this fixed it” evidence for legacy_page_visit_strategy in WT-15538 cases is essentially absent.

       

        1. image-2026-01-25-21-18-19-081.png
          image-2026-01-25-21-18-19-081.png
          317 kB
        2. image-2026-02-01-17-24-54-813.png
          image-2026-02-01-17-24-54-813.png
          138 kB
        3. Screenshot 2026-05-15 at 11.19.13.png
          Screenshot 2026-05-15 at 11.19.13.png
          598 kB
        4. Too many updates.png
          Too many updates.png
          289 kB

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Monica Ng
            Votes:
            1 Vote for this issue
            Watchers:
            41 Start watching this issue

              Created:
              Updated: