Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8996

Review update eviction queuing logic

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None

      As of the 4.4 release, we added a new eviction trigger/target pair. The new thing being tracked is the volume of updates present in the cache.

      Before the 4.4 release (where we introduced durable history), content associated with updates was nearly entirely overlapping with dirty content in cache. Once durable history was introduced, it was much more common to have clean pages in cache that also had associated update structures allocated to them.

      One of the primary reasons WiredTiger strictly limits the volume of dirty content allowed in cache (default 20%) is that dirty content in cache is often primarily made up of many small memory allocations. Each of those memory allocations incurs a small overhead in the memory allocator (tcmalloc in the case of MongoDB). When WiredTiger allows too much dirty content (and thus too many small memory allocations), we see that memory fragmentation and consumption by the memory allocator increases significantly - sometimes to the point where the operating system Out Of Memory killer terminates the process.

      That's a long story about why we started managing the proportion of the cache that can be occupied by "updates" (each update can be thought of as a small memory allocation). It was a reasonably new thing to review content in cache, assess whether it's contributing meaningfully to update content and queue it for eviction.

      The way the eviction server works, is that it tracks the types of pages that currently need to be evicted. The choice is between clean, dirty and updates (or any combination of the three).

      We have seen a few cases where clean and dirty content are within acceptable limits, but the updates content is at the threshold and eviction is not able to make progress at freeing space in the cache. We should review the code that finds and queues pages (especially when only updates are being looked for), and ensure it's not skipping content that could be evicted to make space available for new updates.

            Assignee:
            backlog-server-storage-engines Backlog - Storage Engines Team
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'slack.nextup.jira:slack-integration-plus'. Please contact your Jira administrators.