Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9439

Don't attempt to evict using a checkpoint-cursor snapshot

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • WT11.0.0, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Currently it's possible for an application thread to be borrowed for eviction while it's reading from a checkpoint cursor.

      This produces suboptimal results. First, because we don't have the code for temporarily substituting a new snapshot at this point (it seems to have been proposed and written in WT-6624 but removed before the branch was merged) we'll end up using the checkpoint's snapshot, which in most cases will be extremely old. Since on this path we can only evict visible changes, it won't be able to accomplish anything at all even if everything proceeded as it was supposed to.

      Second, it doesn't work. In general, it switches trees; since the transaction IDs in the checkpoint-cursor transaction may be from an earlier write generation, comparing them to updates in another tree (that is likely a live tree using the current write generation) produces nonsense results. Furthermore, because the checkpoint-cursor hooks in the visibility checks test WT_READING_CHECKPOINT (which is a function of the tree) rather than F_ISSET(txn, WT_TXN_IS_CHECKPOINT), they don't come into play, so we end up in a situation where a current update is visible_all (based on the current transaction state) but not visible (because it isn't in the very old snapshot); this is logically inconsistent/invalid and can lead to subtle strange problems.

      The fix is to bail out of __wt_cache_eviction_check early if we have a checkpoint cursor transaction, since it won't be able to do anything useful anyway, and further down in __evict_review assert that we didn't get there with one.

      Meanwhile, assert in the visibility checks that WT_READING_CHECKPOINT matches F_ISSET(txn, WT_TXN_IS_CHECKPOINT) – if these don't match something's gone off the rails. If a valid reason shows up to read another tree using a checkpoint-cursor transaction, this logic should probably be changed to use the WT_TXN_IS_CHECKPOINT flag. (Which probably should get a longer name so it isn't confused with the transaction used to take a checkpoint, but that's a different issue.) However, in that case it will probably need explicit write-generation logic too and that's likely to be a mess. I don't think any such cases exist, though.

            keith.bostic@mongodb.com Keith Bostic (Inactive)
            dholland+wt@sauclovia.org David Holland
            0 Vote for this issue
            2 Start watching this issue