Layered cursors stress testing - follow up

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Test Python
    • None
    • Storage Engines - Foundations
    • 162.39
    • SE Foundations - 2026-07-07
    • 5

      This ticket was created to describe the features that were left out during the implementation of layered cursor stress testing (test_layered_cursor_stress.py), which was initially added to the repository as part of WT-17782.

      It’s well known that the best is often the enemy of the good. During the implementation of this test, we faced strict time constraints, so it was decided to scope out a number of interesting but non-essential ideas. They are all described below in roughly the priority order that I see them.

      As you can see in the code, this test generates random cursor operations that are then executed on both the leader and the follower. An ASC cursor is used as the reference implementation, and after every operation we verify the operation result, key, and value on every cursor. The test is intentionally read-heavy, with a focus on preserving cursor positioning across many operations while exercising rare edge cases that are unlikely to be encountered in typical customer workloads.

      Here is what I think is left in the priority order:

      • The most important thing would be adding a separate configuration that will run the test with a random seed/configuration to extend the coverage by every new run, since the test is completely deterministic, every fail of such a test should be simply reproducible, so after finding a new issue we can simply introduce a new fixed configuration as we usually do with test/format
      • The second most important thing would be to run the bulk scenarios through a separate cursor in a separate session and transaction, so the inserts/removes arrive like writes from another thread (a production-like concurrent follower writer). For the asc-vs-dsc comparison to stay valid the two tested cursors must always share one snapshot, so this first needs the no-txn (autocommit read-committed) read ops disabled – otherwise asc and dsc refresh at different points. Especially interesting under the read-committed and read-uncommitted isolations.
      • The next one is having two modes for the eviction scenario, so we not only evict everything right after the checkpoint, but also do random evictions for 20/40/60/80/100% but with having a cursor open so we remove only those ingest entries that are permitted to be removed.

      The next points are more about extending the coverage of different operations:

      • Test tombstone prefixed values (to cover _clayered_deleted(de/en)code0
      • Add operations with overwrite on/off
      • Add setting bounds to cursors

      There are a couple of features disabled right now because of the existing bugs, but they are all covered by fixme comments so they should be out of scope of this ticket.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Ivan Kochin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: