Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8881

It is possible to commit with a durable timestamp earlier than that of data read by the same transaction

    • Type: Icon: Technical Debt Technical Debt
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      WT-8747 describes (at some length) a possible data consistency problem that can arise when transactions commit with durable timestamps after their commit timestamps and other transactions then read their writes and commit with earlier durable timestamps.

      We believe that the problem itself does not affect MongoDB, and finding a way to prohibit these commits without interfering with MongoDB proved problematic, so WT-8747 was closed by documenting the issue as a hazard.

      It would be better for these commits to be prohibited, so ideally at some future point this should be revisited, probably not until after the issues related to out-of-order updates have been resolved more thoroughly.

      This github pull request contains most of a solution that works for WT; theoretically it should work for MDB but might not (has not been explicitly tested): https://github.com/wiredtiger/wiredtiger/pull/7485

      It works by tracking the most recent durable timestamp read by each transaction and requiring the commit-time durable timestamp to not be before this.

      It is known to be missing one bit – it does not track the durable timestamps of values read from history, which in theory can be after stable. (Durable timestamps at or before stable don't actually need to be tracked as committing with a durable timestamp at or before stable is prohibited. The implementation tracks all durable timestamps to avoid unnecessary reads of stable, but the impact of the missing bit is still limited by this consideration.) No existing test covers this situation; I wrote a somewhat messy one but it trips on other issues. Might upload it here later; if I never get to that it can be rederived by hacking up the test_durable_ts04.py in the existing pull request.

      (Note that the earlier approach in WT-8747 of prohibiting reads of data between its commit and durable timestamps breaks important optimizations in MongoDB and isn't workable.)

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            dholland+wt@sauclovia.org David Holland
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: