Increasing the number of situations in test_layered91.py results in abort due to cache stuck.

XMLWordPrintableJSON

    • Storage Engines - Foundations
    • None
    • None

      We have a python test, test_layered90.py, that generates a large number of layered table states and creates 1 table for each of these states. Each table can have up to 5 keys, resulting in 2911 unique layered tables.

      We want to allow tables to have up to 6 keys, which means that the number of layered tables rises to 11011. This causes the test to abort with cache stuck for too long (original discussion):

      $ python3 ../test/suite/run.py -v 2 test_layered91
      [pid:99118]: None ... [pid:99118]: test_layered91.test_layered91.test_layered91: starting
      [pid:99118]: Generate unique situations for testing.
      [pid:99118]: Generated 11011 unique situations for testing.
      [pid:99118]: Create layered tables for each situation.
      [pid:99118]: Done - Create layered tables for each situation.
      [pid:99118]: Populate stable keys (S, B, R, X) on the leader.
      zsh: abort      python3 ../test/suite/run.py -v 2 test_layered91
      [1775779846:518218][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: int __evict_server(WT_SESSION_IMPL *, _Bool *), 543: Cache stuck for too long, giving up: Operation timed out
      [1775779847:153301][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: int __wt_btcur_next(WT_CURSOR_BTREE *, _Bool), 795: Error at src/btree/bt_curnext.c:795: "WT_NOTFOUND" failed: WT_NOTFOUND: item not found
      [1775779847:153318][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: int __curfile_next(WT_CURSOR *), 186: Error at src/cursor/cur_file.c:186: "ret" failed: WT_NOTFOUND: item not found
      [1775779847:153328][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: int __evict_thread_run(WT_SESSION_IMPL *, WT_THREAD *), 336: Error at src/evict/evict_lru.c:336: "ret" failed: Operation timed out
      [1775779847:153336][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: int __evict_thread_run(WT_SESSION_IMPL *, WT_THREAD *), 359: eviction thread error: Operation timed out
      [1775779847:153346][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: int __evict_thread_run(WT_SESSION_IMPL *, WT_THREAD *), 359: the process must exit and restart: WT_PANIC: WiredTiger library panic
      [1775779847:153354][79855:0x16f88f000], test_layered91.test_layered91.test_layered91(nbatches=1), file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: void __wt_abort(WT_SESSION_IMPL *), 29: aborting WiredTiger library 

      This ticket is to investigate why this occurs, and if possible fix the issue and raise the max_len of the tables in test_layered90.py from 5 to 6.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Alexander Pullen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: