Commit may be rolled back after we have logged the transaction

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Transactions
    • Storage Engines, Storage Engines - Transactions
    • SE Transactions - 2025-08-01
    • 3
    • v8.2, v8.1, v8.0, v7.0, v6.0

          /*
           * Release our snapshot in case it is keeping data pinned (this is particularly important for
           * checkpoints). Before releasing our snapshot, copy values into any positioned cursors so they
           * don't point to updates that could be freed once we don't have a snapshot. If this transaction
           * is prepared, then copying values would have been done during prepare.
           */
          if (session->ncursors > 0 && !prepare) {
              WT_DIAGNOSTIC_YIELD;
              WT_ERR(__wt_session_copy_values(session));
          }
          __wt_txn_release_snapshot(session);
      

      In the txn commit code, we release the snapshot at the start of the function before marking the updates as committed. This can lead to failed repeated reads if the commit uses timestamp. Here's an example:

      • Transaction A commits with timestamp 200.
      • We release the snapshot and context switch.
      • Another session starts a read transaction with read timestamp 100
      • It reads the update that written by transaction A. The update still has timestamp 0 because the commit hasn't finished. The update is visible to the read transaction.
      • Transaction A resumes commit and finishes marking the updates with timestamp 200.
      • The read transaction reads the same update again. This time it is not visible because the update now has a timestamp 200 which is larger than its read timestamp.

      We should only early release the snapshot if the transaction is not timestamped, such as the checkpoint transaction described in the comment. We should also ensure that we can no longer rollback the transaction after we release the snapshot. Otherwise, repeated reads may still fail even the transaction is not timestamped.

      This can also lead to data corruption or server crash if the updates are evicted/checkpointed before they are marked as committed. Here's the scenario for data corruption.

      • Transaction A has done a set of updates.
      • We start to commit transaction A.
      • We release transaction A's snapshot and context switch.
      • Checkpoint writes some updates of the transaction. (If the update is evicted then the following rollback may crash because of freed memory.)
      • Transaction A resumes commit. However, it hits some error and decides to rollback.

      In this case, we may write some updates that should have been reverted to disk. This may explain some of the inconsistent indices we see in the field.

              Assignee:
              Chenhao Qu
              Reporter:
              Chenhao Qu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: