-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines - Transactions
-
98.901
-
None
-
None
Problem
__sync_checkpoint_can_skip() (src/btree/bt_sync.c:11) decides whether checkpoint's tree walk can skip reconciling a page. Its current skip conditions (leaf page, not HS/disagg-meta, txn->snapshot_data.snap_max < mod->first_dirty_txn, no unresolved multiblock addresses, not RTS/recovery/closing, active snapshot) never look at whether the page was already reconciled by eviction under the same stable timestamp.
Eviction already has an analogous optimization: under WT_CONN_PRECISE_CHECKPOINT, _evict_page (src/evict/evict_page.c:1145-1158) refuses to re-reconcile a page if it was already reconciled at a pinned stable timestamp >= the in-progress checkpoint's timestamp (tracked via mod->rec_pinned_stable_timestamp, src/include/btmem.h:386, set in _rec_write_page_status() at src/reconcile/rec_write.c:483), to avoid duplicating work. Checkpoint has no equivalent check on its own side.
As a result, if eviction reconciles a page under precise checkpoints, and the page is later dirtied again by updates that aren't relevant to the current checkpoint, checkpoint's tree walk will reconcile that page again from scratch even though eviction already did equivalent work.
Note mod->rec_max_txn (btmem.h:384) is not usable for this check: it's only advanced for updates that reconciliation's selection loops (_rec_upd_select / _rec_upd_select_inmem in src/reconcile/rec_visibility.c) consider visible/selectable. Updates skipped for being not-yet-visible, not-yet-stable, or unresolved-prepared don't advance it — they only set has_newer_updatesp. So rec_max_txn cannot prove that no update with a txn id between it and the checkpoint's snap_min exists on the page (e.g. an update invisible at eviction-reconciliation time but visible to this checkpoint's snapshot).
Proposal
- Add a new field on WT_PAGE_MODIFY that tracks the true maximum transaction id seen across every update in a page's update chain during reconciliation — visible or not (excluding only aborted/discarded updates) — populated in the same walk as _rec_upd_select / _rec_upd_select_inmem, independent of the existing visibility-gated max_txn/rec_max_txn. Tracked separately as WT-17951.
- Extend the checkpoint skip logic (e.g. __sync_checkpoint_can_skip) to also skip reconciliation when both:
- mod->rec_pinned_stable_timestamp matches the pinned stable timestamp this checkpoint would use (via __wt_txn_pinned_stable_timestamp(), src/include/txn_inline.h:927), and
- the new field's value (WT-17951) is <= txn->snapshot_data.snap_min — i.e. every transaction id ever seen on the page by that reconciliation, visible or not, is below the checkpoint snapshot's snap_min and therefore already guaranteed visible/committed (struct __wt_txn_snapshot, src/include/txn.h:319-329).
This mirrors the existing eviction-side check but applied from checkpoint's perspective, and avoids redundant reconciliation work for pages that eviction has already written using a checkpoint-compatible snapshot.
Notes
- No existing FIXME or partial implementation of this was found; rec_pinned_stable_timestamp's doc comment (btmem.h:387-391) currently frames it only in terms of avoiding duplicate eviction reconciliation, not checkpoint reuse.
- Need to confirm this is safe for non-leaf/internal pages and for the various skip-disqualifying cases already handled by __sync_checkpoint_can_skip (HS, disagg-meta, RTS, recovery, closing).
- Likely only applicable when WT_CONN_PRECISE_CHECKPOINT is enabled, since rec_pinned_stable_timestamp is only meaningfully populated/consumed in that mode today.
- This ticket depends on WT-17951 (open) for the new field the skip check relies on.
- depends on
-
WT-17951 Track the true max transaction id seen on a page during reconciliation, including invisible/newer updates
-
- Open
-
- is related to
-
WT-17951 Track the true max transaction id seen on a page during reconciliation, including invisible/newer updates
-
- Open
-
-
WT-17680 Use checkpoint snapshot for eviction when precise checkpoint is enabled
-
- Closed
-
-
WT-17681 Skip reconciliation for pages already processed by eviction under the same checkpoint snapshot
-
- Needs Scheduling
-
- related to
-
WT-17951 Track the true max transaction id seen on a page during reconciliation, including invisible/newer updates
-
- Open
-