Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7052

Investigate WT cache eviction improvements based on newer cache replacement algorithms

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      We should investigate improvements to the WT cache eviction algorithm based on newer cache replacement algorithms.

      WiredTiger approximates a widely-used cache replacement algorithm—LRU with midpoint insertion—to provide scan resistance. I.e., to avoid the problem where a one-time scan of a large table causes everything else to be evicted in a standard LRU cache.  In a traditional implementation, this works using a conventional LRU list.  When when new data is added to the cache, it is inserted at the middle of the LRU list.  When data is used a second time (i.e., when there is a cache hit) the data is moved to the beginning of the LRU list.  Thus, data that is just touched once during a sequential scan will enter the middle of the LRU list and progress to the end, at which point it will be evicted.  Hot data that has been promoted to the beginning of the LRU list remains in the cache and isn't displaced by the scan. 

      WiredTiger approximates LRU cache replacement using a read generation value assigned to each page.  WT prioritizes pages with older read generation values for eviction. As michael.cahill points out in the first comment, below, WT adapts midpoint insertion to this scheme by assigning newly read pages a read generation midway between the current and oldest sampled values in the cache.  

      An obvious drawback to midpoint insertion is that a large one-time scan can still evict half of the contents of the cache.  Newer cache algorithms, such as Multi Queue, ARC, LIRS, etc., have attempted to address this by dynamically changing responding to changes of locality in the request stream. None of these algorithms would directly map to the WT cache architecture, but it would still be worth investigating whether they could be adapted to improve WT eviction.

      Note that WT_SESSION::open_cursor() supports a read_once=true configuration option, which a client can set when it knows it will be performing a one time scan. MongoDB uses this option in some places where the query planner recognizes that it will read a lot of data just one time.  Obviously the benefit of midpoint insertion and similar techniques is that the cache can automatically detect and handle these scans rather than relying on application developers to provide the right hints.

            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            keith.smith@mongodb.com Keith Smith
            0 Vote for this issue
            4 Start watching this issue