-
Type:
Task
-
Resolution: Works as Designed
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Checkpoints, Reconciliation
-
None
-
Storage Engines, Storage Engines - Persistence
-
None
-
None
In disagg, rec_write_err incorrectly decrements bytes_total when freeing failed delta block writes, corrupting size accounting and triggering an assertion on the next successful checkpoint.
bytes_total is incremented by delta_size when block_disagg_write writes a delta. However, _wti_block_disagg_page_discard (called through __ _wt_btree_block_free) decrements by cookie.size which is the full cumulative chain size (old_cumulative + delta_size). This causes:
net change to bytes_total = +delta_size - (old_cumulative + delta_size) = -old_cumulative
The entire prior chain's contribution is erased. When the next successful checkpoint calls __wt_btree_decrease_size(cumulative_size) here, bytes_total < cumulative_size and an assertion fires.
Proposed fix:
Before freeing a failed disagg delta block in __rec_write_err, compensate by adding back old_cumulative (retrieved from page->disagg_info->block_meta.cumulative_size, which is not updated until wrapup succeeds). The guard multi->block_meta != NULL && delta_count > 0 identifies disagg delta blocks. Full-page writes (delta_count == 0) are unaffected since cookie.size == size in that case.
Additionally, this fix resolves the underlying bug that caused the assert in
__wt_btree_decrease_size to be disabled in WT-16738. With correct accounting, the assert that bytes_total >= size before decrement can be re-enabled.
Definition of Done:
- Fix delta chain bytes_total accounting in __rec_write_err for failed disagg delta writes
- Re-enable the assert in __wt_btree_decrease_size (disabled in
WT-16738) - Run a full disagg patch
- is related to
-
WT-16738 Investigate if we double account root size in checkpoint size
-
- Closed
-