-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Checkpoints
-
None
-
Storage Engines, Storage Engines - Persistence
-
SE Persistence backlog
-
None
When a page with a delta chain is re-read from the page service (e.g., after eviction or connection reopen), block_meta->cumulative_size is set to the raw size of the most recent block instead of the true cumulative total of base + all deltas.
The bug is in __block_disagg_read_multiple (block_disagg_read.c). The function receives the correct cumulative size via its size parameter (from cookie.size on the address cookie), but the loop overwrites size on each iteration with individual block sizes:
size = (uint32_t)current->size;
The block_meta->cumulative_size = size assignment was inside the result == last block, which executes on the first loop iteration – after size has already been overwritten. So cumulative_size ends up holding just the most recent block's raw size rather than the cumulative total.
After re-read, subsequent delta writes compute cookie.size incorrectly:
// In __wti_block_disagg_write (block_disagg_write.c) cookie.size = block_meta->cumulative_size + size; // Uses wrong (too small) cumulative_size from the re-read
When the delta chain eventually terminates, the discard path subtracts this too-small cookie.size from bytes_total, causing a permanent leak.
Concrete example:
- Write base page (4KB): cumulative_size = 4KB, cookie.size = 4KB
- Write delta1 (1KB): cumulative_size = 5KB, cookie.size = 5KB
- Write delta2 (1KB): cumulative_size = 6KB, cookie.size = 6KB
- Page evicted and re-read: cumulative_size = 1KB (BUG – should be 6KB)
- Write delta3 (1KB): cookie.size = 1KB + 1KB = 2KB (should be 7KB)
- Delta chain terminates, discard: bytes_total
= 2KBinstead of = 7KB – 5KB leaked
This leak compounds with every eviction/re-read cycle and is proportional to the size of the base page + early deltas in the chain.
Fix
Move the block_meta->cumulative_size assignment before the loop, while size still holds the function parameter (the correct cumulative from the cookie):
block_meta->cumulative_size = size;
for (result = last; result >= 0; result--) {
...
size = (uint32_t)current->size;
...
}
Added a post-loop diagnostic assert that verifies the cookie's cumulative matches the sum of all individual block sizes returned by the page service:
WT_ASSERT(session, block_meta->cumulative_size == block_size_sum);