Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8747

Reading between commit_timestamp and durable_timestamp can produce inconsistency




      There is a problem with prepared transactions whose durable timestamp is greater than their commit timestamp, which is that it's possible for another transaction to read them and commit before them, then get checkpointed without them, such that a crash causes the first transaction to disappear but not the second that depended on it.

      For example:

      • commit "mykey"="abcde" with commit_timestamp=20,durable_timestamp=30
      • read "mykey" at time 22, see "abcde"
      • write "otherkey"="mykey was abcde"
      • commit with commit_timestamp = durable_timestamp = 25
      • set stable to 28
      • checkpoint, crash
      • read "mykey" and "otherkey", see "mykey was abcde" but it isn't

      There are two manifestations of this, one as above and the other (which is not actually different but seems a lot scarier) when the commit timestamp of the first transaction is < stable. (This is allowed as long as it prepares after stable; stable can move up before it commits and only the durable timestamp is required to be past the new stable.)

      Discussion with @haribabu.kommi and @keith.bostic turned up the suggested solution of prohibiting reads between an update's commit timestamp and its durable timestamp. I am looking into this. It appears to require only a new case in __wt_txn_upd_visible_type, an extra visibility type code, a check in __wt_txn_read_upd_list_internal, and possibly a new error code to return if we want to be explicit about it. Not going to get the patch done and tested tonight, but will likely have it early next week.


        Issue Links



              keith.bostic@mongodb.com Keith Bostic (Inactive)
              dholland+wt@sauclovia.org David Holland
              0 Vote for this issue
              11 Start watching this issue