Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- foggy-future
- tiered-storage

Sprint:
None
Story Points:
8

A typical WT data file may have portions of the file that aren't being used. This may be due to a checkpoint being deleted that references some number of blocks uniquely. In a workload with a high number of deletes or truncates, there may be a larger proportion of such gaps.

Once we decide (via flush_tier) that a data file is finalized, and will become readonly, we have an opportunity when writing it to the cloud. We can look at the extent lists for any active checkpoints in the file, and any blocks that aren't there, we simply don't write. Each object written to the cloud would probably need a header to indicate which blocks are missing. The header would be used when the file is read to "reconstruct" the file on disk, or can be used with any sort of "disk file fragment" cache we have in operation. Any accesses to gaps "shouldn't happen".

Note that this optimization may be useful only when using tiered storage that isn't shared. When we start to share tiered storage, we will probably be using union tables, which will have the effect of creating tiered data files that are "tight", without any gaps. So it's essential to look at any expected gains in this light.

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Donald Anderson
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 27 2022 07:26:25 PM UTC
Updated:: May 06 2023 12:57:08 AM UTC

Details

Description

Attachments

Activity

People

Dates