Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

We should investigate improvements to the WT cache eviction algorithm based on newer cache replacement algorithms.

WiredTiger approximates a widely-used cache replacement algorithm—LRU with midpoint insertion—to provide scan resistance. I.e., to avoid the problem where a one-time scan of a large table causes everything else to be evicted in a standard LRU cache. In a traditional implementation, this works using a conventional LRU list. When when new data is added to the cache, it is inserted at the middle of the LRU list. When data is used a second time (i.e., when there is a cache hit) the data is moved to the beginning of the LRU list. Thus, data that is just touched once during a sequential scan will enter the middle of the LRU list and progress to the end, at which point it will be evicted. Hot data that has been promoted to the beginning of the LRU list remains in the cache and isn't displaced by the scan.

WiredTiger approximates LRU cache replacement using a read generation value assigned to each page. WT prioritizes pages with older read generation values for eviction. As michael.cahill points out in the first comment, below, WT adapts midpoint insertion to this scheme by assigning newly read pages a read generation midway between the current and oldest sampled values in the cache.

An obvious drawback to midpoint insertion is that a large one-time scan can still evict half of the contents of the cache. Newer cache algorithms, such as Multi Queue, ARC, LIRS, etc., have attempted to address this by dynamically changing responding to changes of locality in the request stream. None of these algorithms would directly map to the WT cache architecture, but it would still be worth investigating whether they could be adapted to improve WT eviction.

Note that WT_SESSION::open_cursor() supports a read_once=true configuration option, which a client can set when it knows it will be performing a one time scan. MongoDB uses this option in some places where the query planner recognizes that it will read a lot of data just one time. Obviously the benefit of midpoint insertion and similar techniques is that the cache can automatically detect and handle these scans rather than relying on application developers to provide the right hints.

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Keith Smith
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jan 04 2021 10:14:37 PM UTC
Updated:: Apr 05 2022 01:10:02 AM UTC

Details

Description

Attachments

Activity

People

Dates