Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Block Manager
Labels:
- code-quality

Assigned Teams:

Storage Engines, Storage Engines - Persistence
Sprint:
Joker - StorEng - 2023-10-17, 2024-01-09 - I Grew Tired, StorEng - 2024-01-23, 2024-02-06 tapioooooooooooooca, 2024-02-20_A_near-death_puffin, 2024-03-05 - Claronald, 2024-03-19 - PacificOcean, Megabat - 2024-05-14
Story Points:
5

An alternative approach to addressing WT-10831 would be to copy blocks to new, correctly sized buffers during read.

WT-10831 addressed a bug where WT incorrectly tracked the amount of memory allocated to the cache. Specifically when reading a page that is smaller than the configured chunk size, WT would read the page into a chunk-sized piece of memory, but only increase the cache size by the page size. For example, if we stored a 1KB page in a 4KB block, then during read, WT would allocate 4KB of memory, read the 4KB file chunk into that memory, then increase the internal cache statistics by 1KB. On workloads that had many small blocks of this type, WT would wind up drastically overcommitting the cache, sometimes leading to OOM kills.

Note that this is only an issue for files that are not compressed. When blocks are compressed, WT reads the blocks from disk, gets the in-memory (i.e., decompressed) page size from the block header, allocates a properly sized buffer, and then decompresses the block into that buffer.

WT-10831 addressed the problem by properly accounting for the size of the buffer used to read data from disk. So in my example, WT-10831 increases the cache stats by 4KB, rather than 1KB, thus accurately accounting for the memory consumed by the cache. The downside is that this results in less effective use of the cache, as we are (correctly) billing the cache for unused space in these buffers (i.e., 3KB of unused space in the example). This can hurt performance as shown in ~~WT-11320~~.

An alternative approach to addressing WT-10831 would be copy the data to a properly sized buffer when this happens. So in the example, we would read 4KB block from disk, allocate a 1KB buffer to hold the page, copy the data to that 1KB page, free the original 4KB, and increase the cache stats by 1KB. This would result in caching the same amount of data as before WT-10831, but with the overhead of data copies for affected blocks. But since these blocks are typically small, that overhead should be modest compared to the savings (i.e., avoiding future disk IO by caching more data).

Because pages are rarely exact multiples of the chunk size, a naïve implementation would wind up copying essentially every uncompressed block. I would suggest adding a threshold so that we only copy if we save more than X bytes of cache space (or Y%?), relying on WT-10831 to keep the accounting accurate for other cases.

is related to

WT-10831 Improve cache tracking for small uncompressed pages

Open

Assignee:: Unassigned
Reporter:: Keith Smith
Collaborators:: Ruby Chen
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jul 18 2023 02:46:25 PM UTC
Updated:: Mar 21 2025 12:28:48 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates