Modify that sees an outdated tombstone returnd WT_NOTFOUND instead of WT_ROLLBACK

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cursors
    • None
    • Storage Engines - Foundations
    • None
    • None

      Description:

      When cursor->modify() is called with a read timestamp that falls between a tombstone and a subsequent re-insert on the same key, it incorrectly returns WT_NOTFOUND rather than WT_ROLLBACK.

      Scenario:

      1. Key K is inserted at ts=T1.
      2. Key K is removed at ts=T2 (tombstone at T2).
      3. Key K is re-inserted at ts=T3 (T3 > T2).
      4. A write transaction with read_ts=R (T2 ≤ R < T3) calls cursor->modify() on key K.

      At read_ts=R the tombstone at T2 is visible and the re-insert at T3 is not. The modify encounters the tombstone and returns WT_NOTFOUND.

      Expected behaviour:

      The re-insert at T3 is a committed but invisible update. __curfile_update_check should detect it and return WT_ROLLBACK, signalling the caller to retry with a higher read timestamp.

      Actual behaviour:

      WT_NOTFOUND is returned. When the in-memory update chain has been cleared (e.g. after a checkpoint), _curfile_update_check falls back to the on-page time window, which may only reflect the tombstone state and fail to detect the later invisible committed re-insert. _wti_cursor_valid then sees the visible tombstone and returns WT_NOTFOUND instead of WT_ROLLBACK.

      Impact:

      A WT_NOTFOUND return from modify is treated as a silent no-op by the caller rather than a conflict requiring retry. This means a write transaction can miss a write-write conflict and silently fail to apply a modification that should either succeed or be retried. For MongoDB, any workload that mixes reads and writes in the same transaction across a key that has been deleted and re-inserted within the visible timestamp window is at risk of silently dropping a modify, which could result in data corruption.

      Related: WT-17247 (same class of bug for cursor->remove())

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Alexander Pullen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: