Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: WT11.0.0, 6.1.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

Currently it's possible for an application thread to be borrowed for eviction while it's reading from a checkpoint cursor.

This produces suboptimal results. First, because we don't have the code for temporarily substituting a new snapshot at this point (it seems to have been proposed and written in ~~WT-6624~~ but removed before the branch was merged) we'll end up using the checkpoint's snapshot, which in most cases will be extremely old. Since on this path we can only evict visible changes, it won't be able to accomplish anything at all even if everything proceeded as it was supposed to.

Second, it doesn't work. In general, it switches trees; since the transaction IDs in the checkpoint-cursor transaction may be from an earlier write generation, comparing them to updates in another tree (that is likely a live tree using the current write generation) produces nonsense results. Furthermore, because the checkpoint-cursor hooks in the visibility checks test WT_READING_CHECKPOINT (which is a function of the tree) rather than F_ISSET(txn, WT_TXN_IS_CHECKPOINT), they don't come into play, so we end up in a situation where a current update is visible_all (based on the current transaction state) but not visible (because it isn't in the very old snapshot); this is logically inconsistent/invalid and can lead to subtle strange problems.

The fix is to bail out of __wt_cache_eviction_check early if we have a checkpoint cursor transaction, since it won't be able to do anything useful anyway, and further down in __evict_review assert that we didn't get there with one.

Meanwhile, assert in the visibility checks that WT_READING_CHECKPOINT matches F_ISSET(txn, WT_TXN_IS_CHECKPOINT) – if these don't match something's gone off the rails. If a valid reason shows up to read another tree using a checkpoint-cursor transaction, this logic should probably be changed to use the WT_TXN_IS_CHECKPOINT flag. (Which probably should get a longer name so it isn't confused with the transaction used to take a checkpoint, but that's a different issue.) However, in that case it will probably need explicit write-generation logic too and that's likely to be a mess. I don't think any such cases exist, though.

causes

WT-9452 Don't attempt to evict using a checkpoint-cursor snapshot

Closed

WT-9440 failed: checkpoint-stress-test on ubuntu2004-stress-tests [wiredtiger @ 45a5458d]

Closed

related to

WT-9344 Enforce some assumptions about checkpoint cursors

Closed

Assignee:: Keith Bostic (Inactive)
Reporter:: David Holland
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 07 2022 11:36:23 PM UTC
Updated:: Oct 29 2023 04:39:28 PM UTC
Resolved:: Jun 09 2022 03:57:13 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates