Replace WT_UPDATE_PREPARE_ROLLBACK flag with a tombstone appended at the end of the update chain

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Transactions
    • 424.919
    • SE Transactions - 2026-05-22, SE Transactions - 2026-06-05
    • 5

      Background

      When rolling back a prepared transaction for a key that has no prior value (a prepared INSERT), __txn_prepare_rollback_delete_key in src/txn/txn.c inserts a tombstone at the head of the update chain and marks it with the WT_UPDATE_PREPARE_ROLLBACK flag. This flag was introduced to distinguish these system-generated tombstones from regular user tombstones.

      WT-17586 identified the same underlying pattern from the drain path: appending a globally-visible tombstone below the prepared update (at the end of the chain) produces the correct post-rollback chain state without requiring a distinguishing flag, because position in the chain encodes its role.

      Problem

      The WT_UPDATE_PREPARE_ROLLBACK flag forces several call sites to carry special-case logic:

      • src/reconcile/rec_visibility.c (two sites, ~lines 359 and 995) — skips the rollback tombstone from obsolete-update pruning and tracks it separately as prepare_rollback_tombstone for the preserve-prepared reconciliation path.
      • src/btree/row_modify.c (~line 406) — skips the tombstone from update-chain pruning to avoid incorrectly truncating the chain while other threads access it.
      • src/btree/bt_debug.c — debug-prints the flag.
      • src/include/btmem.h — flag definition 0x0080u.

      This flag-based approach adds cognitive overhead to every reader of the reconciliation and pruning paths, and must be kept in sync with any future changes to prepared rollback semantics.

      Proposed Change

      In __txn_prepare_rollback_delete_key (src/txn/txn.c), instead of inserting a tombstone at the head of the update chain with the WT_UPDATE_PREPARE_ROLLBACK flag, append a globally visible tombstone after the prepared update (at the tail of the chain):

      • upd->txnid = WT_TXN_NONE
      • upd->start_ts = WT_TS_NONE, upd->durable_ts = WT_TS_NONE
      • upd->type = WT_UPDATE_TOMBSTONE

      With this tombstone as the oldest entry in the chain, reconciliation and pruning see a normal timestamp-visible tombstone below the prepared update. No flag is needed to distinguish it. Once the prepared update is removed (via RTS or the rollback-stable skip), the tombstone correctly represents "this key does not exist."

      After this change, remove WT_UPDATE_PREPARE_ROLLBACK and all its call sites:

      • Remove the flag definition from src/include/btmem.h.
      • Remove the F_ISSET(upd, WT_UPDATE_PREPARE_ROLLBACK) guards and the prepare_rollback_tombstone tracking variable from src/reconcile/rec_visibility.c.
      • Remove the corresponding skip in src/btree/row_modify.c.
      • Remove the debug-print in src/btree/bt_debug.c.

      Relationship to WT-17586

      WT-17586 applied this same end-of-chain tombstone pattern in the drain (conn_layered_ingest.c) path. This ticket generalises it to the standard prepare rollback path in txn.c and removes the now-unnecessary flag machinery.

      Verification

      Existing prepare-rollback and RTS tests (Python suite test/suite/test_prepare*.py, csuite test_wt6943_g_prepare_largeDS, format) must continue to pass. Add or extend a test that covers the INSERT-then-rollback case (key absent before and after rollback) to confirm the tombstone is written to disk correctly after eviction and re-read.

            Assignee:
            Chenhao Qu
            Reporter:
            Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: