Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: WT10.0.1, 5.2.0
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

One of the subtests of test_inmem01 seems to fail sometimes under load on FLCS. This test writes data until it gets WT_CACHE_FULL, then deletes a quarter of it, then tries to rewrite the first 1000 rows, retrying on failure (which is tacitly assumed to be WT_CACHE_FULL). Under sufficient load, on FLCS, it will sometimes retry forever and fail.

First, note that deleting doesn't recover space on FLCS, because deleted values are stored as zero; so reconciling the deletions doesn't make any more room in the cache. On the other hand, reconciling updates does save a lot of space, because an update structure is much larger than an on-disk value (which is at most one byte...)

I think what's happening is that under sufficient load all the pages in the initial write get reconciled during that write, so by the time it stops all the possible space that can be wrung out by reconciling that data already has been. Then the deletions accomplish nothing, and when it goes to try to do more updates there's no space and no space to reclaim, so it gets WT_CACHE_FULL forever until the test gives up and fails.

I don't see any way to fix this, since we have no way to mark pages to keep them from being reconciled. (Otherwise, doing that on the first page of the initial write pass would do the trick.)

The idea I've come up with is to check how many rows the initial write generates (since the cache size is fixed, this indicates how much reconciliation has already happened). If it's too high skip the test at that point, and otherwise continue. This lets it run much of the time (it succeeded on all the runs testing ~~WT-8287~~) but avoid generating noise if it's going to get stuck. While this seems a little cheesy, I think it's better than just turning off the test for FLCS and that's probably the only other viable choice.

Unrelatedly I noticed that this and one of the other subtests say "... verify removes succeed" except that they don't actually check that remove succeeds. I propose to fix this while passing through

Assignee:: Unassigned
Reporter:: David Holland
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Nov 13 2021 04:42:14 AM UTC
Updated:: Oct 29 2023 04:40:47 PM UTC
Resolved:: Nov 13 2021 05:57:51 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates