Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Test Python
Labels:
None

Assigned Teams:

Storage Engines - Foundations
Total Hours with Assigned Team:
162.39
Epic Link:
SPM-4736
Sprint:
SE Foundations - 2026-07-07
Story Points:
5

This ticket was created to describe the features that were left out during the implementation of layered cursor stress testing (test_layered_cursor_stress.py), which was initially added to the repository as part of ~~WT-17782~~.

It’s well known that the best is often the enemy of the good. During the implementation of this test, we faced strict time constraints, so it was decided to scope out a number of interesting but non-essential ideas. They are all described below in roughly the priority order that I see them.

As you can see in the code, this test generates random cursor operations that are then executed on both the leader and the follower. An ASC cursor is used as the reference implementation, and after every operation we verify the operation result, key, and value on every cursor. The test is intentionally read-heavy, with a focus on preserving cursor positioning across many operations while exercising rare edge cases that are unlikely to be encountered in typical customer workloads.

Here is what I think is left in the priority order:

The most important thing would be adding a separate configuration that will run the test with a random seed/configuration to extend the coverage by every new run, since the test is completely deterministic, every fail of such a test should be simply reproducible, so after finding a new issue we can simply introduce a new fixed configuration as we usually do with test/format
The second most important thing would be to run the bulk scenarios through a separate cursor in a separate session and transaction, so the inserts/removes arrive like writes from another thread (a production-like concurrent follower writer). For the asc-vs-dsc comparison to stay valid the two tested cursors must always share one snapshot, so this first needs the no-txn (autocommit read-committed) read ops disabled – otherwise asc and dsc refresh at different points. Especially interesting under the read-committed and read-uncommitted isolations.
The next one is having two modes for the eviction scenario, so we not only evict everything right after the checkpoint, but also do random evictions for 20/40/60/80/100% but with having a cursor open so we remove only those ingest entries that are permitted to be removed.

The next points are more about extending the coverage of different operations:

Test tombstone prefixed values (to cover _clayered_deleted(de/en)code0
Add operations with overwrite on/off
Add setting bounds to cursors

There are a couple of features disabled right now because of the existing bugs, but they are all covered by fixme comments so they should be out of scope of this ticket.

is related to

WT-17782 Add cursor-oriented stress testing for layered (disaggregated) cursors

Closed

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Ivan Kochin
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Jun 15 2026 11:00:51 PM UTC
Updated:: Jun 22 2026 06:34:46 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates