Page meta Cumulative size incorrectly set on page re-read from page service

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Checkpoints
    • None
    • Storage Engines, Storage Engines - Persistence
    • SE Persistence backlog
    • None

      When a page with a delta chain is re-read from the page service (e.g., after eviction or connection reopen), block_meta->cumulative_size is set to the raw size of the most recent block instead of the true cumulative total of base + all deltas.

      The bug is in __block_disagg_read_multiple (block_disagg_read.c). The function receives the correct cumulative size via its size parameter (from cookie.size on the address cookie), but the loop overwrites size on each iteration with individual block sizes:

      size = (uint32_t)current->size;
      

      The block_meta->cumulative_size = size assignment was inside the result == last block, which executes on the first loop iteration – after size has already been overwritten. So cumulative_size ends up holding just the most recent block's raw size rather than the cumulative total.

      After re-read, subsequent delta writes compute cookie.size incorrectly:

      // In __wti_block_disagg_write (block_disagg_write.c)
      cookie.size = block_meta->cumulative_size + size;
      // Uses wrong (too small) cumulative_size from the re-read
      

      When the delta chain eventually terminates, the discard path subtracts this too-small cookie.size from bytes_total, causing a permanent leak.

      Concrete example:

      1. Write base page (4KB): cumulative_size = 4KB, cookie.size = 4KB
      2. Write delta1 (1KB): cumulative_size = 5KB, cookie.size = 5KB
      3. Write delta2 (1KB): cumulative_size = 6KB, cookie.size = 6KB
      4. Page evicted and re-read: cumulative_size = 1KB (BUG – should be 6KB)
      5. Write delta3 (1KB): cookie.size = 1KB + 1KB = 2KB (should be 7KB)
      6. Delta chain terminates, discard: bytes_total = 2KB instead of = 7KB5KB leaked

      This leak compounds with every eviction/re-read cycle and is proportional to the size of the base page + early deltas in the chain.

      Fix

      Move the block_meta->cumulative_size assignment before the loop, while size still holds the function parameter (the correct cumulative from the cookie):

      block_meta->cumulative_size = size;
      
      for (result = last; result >= 0; result--) {
          ...
          size = (uint32_t)current->size;
          ...
      }
      

      Added a post-loop diagnostic assert that verifies the cookie's cumulative matches the sum of all individual block sizes returned by the page service:

      WT_ASSERT(session, block_meta->cumulative_size == block_size_sum);
      

            Assignee:
            Luke Pearson
            Reporter:
            Luke Pearson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: