Fix bytes_total not being rolled back on addr_pack failure in reconciliation

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      In __wti_block_disagg_write, bytes_total is incremented immediately after the page is successfully written to SLS. If address packing subsequently fails, the error returns with no cleanup - bytes_total has been incremented but is never rolled back.

      The error propagates up through reconciliation (_rec_split_write) where the cleanup path (rec_write_err) checks for a block cookie before freeing the written block. Since address packing failed before the cookie was ever set, the check is false and no decrement occurs. Because_wt_memdup only executes if addr_pack succeeds, a failure here guarantees the cookie is never set. The reconciliation cleanup path is gated on the cookie existing, so it can never roll back an addr_pack failure - the decrement must happen in the block manager, at the point of failure, before the error is returned.

      The result is permanent bytes_total inflation: the orphaned page is eventually GC'd by the page server, but WT has no mechanism to follow suit, and every subsequent checkpoint size is overstated by the size of the orphaned write.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Mariam Mojid
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: