-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Block Manager
-
Storage Engines
-
None
-
None
There are several opportunities for WiredTiger where it could be worth replacing a full page image + a long chain of deltas by an equivalent full page image, without having to rewrite the internal pages of the btree all the way to the root.
For example, this could be very useful in the following cases:
- Block cache: If we use the victim cache architecture, we would read a page straight into WiredTiger cache (bypassing the block cache), and write it to the block cache only when it is evicted from the WT cache. But since we flatten deltas after read by default, we would be evicting a full page image instead of the original image plus deltas.
- Off-peak processing: If a database is not experiencing much load, it could fetch some pages with long delta chains and replace them in the storage service by the equivalent full page image (provided that the storage service would support this feature).
The new full page image would have a different checksum that the original image plus deltas, so we would not be able to rely on the checksum stored in the address cookie to validate the page. We could solve this accordingly:
- Add a new original_checksum field to the disagg header. Original image would have it set to 0.
- Add a new flag that specifies whether the page is a replacement page of an earlier original image.
- If we replace an original image + deltas by a new image, the new image’s header would set this flag, and it would set the original_checksum field to the checksum from the original, which would match the checksum from the address cookie (i.e., the checksum of the delta image to which the address cookie points).
- During read, the block manager would validate the page using the checksum from the page’s own block header. Then, if the flag is set, it would compare the value of the original_checksum to the checksum from the address cookie; otherwise it would compare the calculated checksum.