Prepared committed update on ingest btree may be incorrectly pruned when globally visible but durable timestamp exceeds prune timestamp

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Timestamps
    • Storage Engines, Storage Engines - Transactions
    • 0.626
    • SE Transactions - 2026-05-08, SE Transactions - 2026-05-22
    • 1
    • 35

      Summary

      In src/btree/row_modify.c, the update chain pruning logic for garbage-collection-enabled btrees contains a condition ordering bug. A prepared+committed update on the ingest btree may be deleted from the update chain because it satisfies __wt_txn_upd_visible_all() — even when its durable timestamp is larger than the prune timestamp. Losing this update causes step-up to fail to find and resolve the prepared update.

      Affected Code

      src/btree/row_modify.c — update chain trimming logic (approximate line 438):

      if (__wt_txn_upd_visible_all(session, upd) ||
        (F_ISSET(CUR2BT(cbt), WT_BTREE_GARBAGE_COLLECT) &&
          (txnid < oldest_id && prune_timestamp != WT_TS_NONE &&
            upd->upd_durable_ts <= prune_timestamp))) {
          if (first == NULL && WT_UPDATE_DATA_VALUE(upd))
              first = upd;
      } else
          first = NULL;
      

      Root Cause

      The two branches of the || are evaluated independently:

      1. __wt_txn_upd_visible_all() returns true for a prepared+committed update whose transaction ID is older than the oldest active transaction — the update is considered "globally visible."
      2. The garbage-collect branch additionally guards with upd->upd_durable_ts <= prune_timestamp, which is the correct check for whether the update is safe to prune on the ingest btree.

      Because the globally-visible path short-circuits the garbage-collect guard, a prepared+committed update with durable_ts > prune_timestamp can still be treated as a pruning candidate (set as first and subsequently dropped from the chain). The ingest btree needs the durable-timestamp guard to apply to all updates, not just those that aren't globally visible.

      Impact

      During step-up, the disaggregated storage engine iterates the update chain to find and resolve prepared updates. If a prepared+committed update has been pruned because it appeared globally visible, step-up cannot locate it and the resolve step is skipped or fails, leading to data inconsistency.

      Proposed Fix

      For ingest btrees with garbage collection enabled, the prune decision should require both global visibility and durable_ts <= prune_timestamp. The __wt_txn_upd_visible_all() path should not bypass the durable-timestamp guard when WT_BTREE_GARBAGE_COLLECT is set.

      Reproduction

      Reproduces via the disaggregated storage test suite; specifically scenarios involving:

      1. A prepared transaction that commits with a durable timestamp above the current prune timestamp.
      2. A checkpoint or read that triggers update chain trimming on the ingest btree.
      3. A subsequent step-up that must resolve the prepared update.

      References

      • Code location: src/btree/row_modify.c ~line 438

            Assignee:
            Chenhao Qu
            Reporter:
            Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: