format-stress-test-disagg-switch: cache stuck for too long, eviction thread panics

    • Type: Build Failure
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Cache and Eviction

      format-stress-test-disagg-switch-data-validation-1 on amazon2023-stress-tests-arm64

      Host: i-01747fe9583d885a7
      Project: wiredtiger
      Commit: 4d76b33b
      Please refer to BF(G) Playbook for instructions on handling BF and BFG tickets as well as Auto-Resolution Rules

      Task Logs:

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      /opt/mongodbtoolchain/revisions/93d85cc1a00ca1d53fcc7d4ca790f19a4e5dd542/scripts/products.sh: line 13: /opt/mongodbtoolchain/revisions/93d85cc1a00ca1d53fcc7d4ca790f19a4e5dd542/scripts/installer.sh: No such file or directory
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      [417/672] Building C object test/csuite/CMakeFiles/test_wt16990_disagg_checkpoint_panic.dir/wt16990_disagg_checkpoint_panic/main.c.o
      [420/672] Linking CXX executable test/csuite/wt16990_disagg_checkpoint_panic/test_wt16990_disagg_checkpoint_panic
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      [1782303423:696204][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: __evict_server, 278: Cache stuck for too long, giving up: Connection timed out
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      0x52b4d1018000:transaction state dump
      0x52b4d1018000:current ID: 654395
      0x52b4d1018000:last running ID: 654395
      0x52b4d1018000:metadata_pinned ID: 654395
      0x52b4d1018000:oldest ID: 654395
      0x52b4d1018000:durable timestamp: (0, 6587402)
      0x52b4d1018000:oldest timestamp: (0, 6587377)
      0x52b4d1018000:pinned timestamp: (0, 6587377)
      0x52b4d1018000:stable timestamp: (0, 6587377)
      0x52b4d1018000:stable disaggregated schema epoch: (0, 0)
      0x52b4d1018000:has_durable_timestamp: yes
      0x52b4d1018000:has_oldest_timestamp: yes
      0x52b4d1018000:has_pinned_timestamp: yes
      0x52b4d1018000:has_stable_timestamp: yes
      0x52b4d1018000:has_stable_disaggregated_schema_epoch: no
      0x52b4d1018000:oldest_is_pinned: yes
      0x52b4d1018000:stable_is_pinned: yes
      0x52b4d1018000:checkpoint running: no
      0x52b4d1018000:checkpoint generation: 1
      0x52b4d1018000:checkpoint pinned ID: 0
      0x52b4d1018000:checkpoint txn ID: 0
      0x52b4d1018000:session count: 20
      0x52b4d1018000:Transaction state of active sessions:
      0x52b4d1018000:session ID: 19, txn ID: 0, pinned ID: 654395, metadata pinned ID: 0, name: disagg-step-up
      0x52b4d1018000:transaction id: 0, mod count: 0, snap min: 654395, snap max: 654395, snapshot count: 0, snapshot: [], commit_timestamp: (0, 0), durable_timestamp: (0, 0), first_commit_timestamp: (0, 0), prepare_timestamp: (0, 0), prepared id: 0, pinned_durable_timestamp: (0, 0), read_timestamp: (0, 0), checkpoint LSN: [0,0], full checkpoint: false, flags: 0x00000004, isolation: WT_ISO_SNAPSHOT, last saved error code: 0, last saved sub-level error code: -32000, last saved error message: last API call was successful
      0x52b4d1018000:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      0x52b4d1018000:cache dump
      0x52b4d1018000:cache full: no
      0x52b4d1018000:cache clean check: yes (98.510%)
      0x52b4d1018000:cache dirty check: yes (56.418%)
      0x52b4d1018000:cache updates check: yes (38.916%)
      0x52b4d1018000:file:T00002.wt_stable(<live>):
      0x52b4d1018000:internal: 175 pages, 4648.19 KB, 109/66 clean/dirty pages, 3131.09/1517.10 clean / dirty KB, 162.46 KB max page, 87.50 KB max dirty page
      0x52b4d1018000:leaf: 12310 pages, 470368.19 KB, 4057/8253 clean/dirty pages, 119321.85 /351046.33 /140870.00 clean/dirty/updates KB, 465.05 KB max page, 465.05 KB max dirty page
      0x52b4d1018000:file:T00003.wt_stable/WiredTigerCheckpoint.25(<live>):
      0x52b4d1018000:internal: 161 pages, 3917.86 KB, 161/0 clean/dirty pages, 3917.86/0.00 clean / dirty KB, 161.96 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:leaf: 13424 pages, 434938.94 KB, 13424/0 clean/dirty pages, 434938.94 /0.00 /0.00 clean/dirty/updates KB, 183.64 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:file:T00003.wt_ingest(<live>):
      0x52b4d1018000:internal: 1 pages, 33.98 KB, 0/1 clean/dirty pages, 0.00/33.98 clean / dirty KB, 33.98 KB max page, 33.98 KB max dirty page
      0x52b4d1018000:leaf: 193 pages, 295811.21 KB, 0/193 clean/dirty pages, 0.00 /295811.21 /279110.29 clean/dirty/updates KB, 2710.56 KB max page, 2710.56 KB max dirty page
      0x52b4d1018000:file:T00002.wt_stable/WiredTigerCheckpoint.25(<live>):
      0x52b4d1018000:internal: 183 pages, 4915.31 KB, 183/0 clean/dirty pages, 4915.31/0.00 clean / dirty KB, 162.46 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:leaf: 13142 pages, 426691.22 KB, 13142/0 clean/dirty pages, 426691.22 /0.00 /0.00 clean/dirty/updates KB, 186.65 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:file:T00002.wt_ingest(<live>):
      0x52b4d1018000:internal: 1 pages, 1008.62 KB, 0/1 clean/dirty pages, 0.00/1008.62 clean / dirty KB, 1008.62 KB max page, 1008.62 KB max dirty page
      0x52b4d1018000:leaf: 5825 pages, 298686.10 KB, 0/5825 clean/dirty pages, 0.00 /298686.10 /284535.19 clean/dirty/updates KB, 885.35 KB max page, 885.35 KB max dirty page
      0x52b4d1018000:file:T00001.wt(<live>):
      0x52b4d1018000:internal: 15568 pages, 98203.17 KB, 2697/12871 clean/dirty pages, 16090.15/82113.02 clean / dirty KB, 241.21 KB max page, 88.11 KB max dirty page
      0x52b4d1018000:leaf: 451245 pages, 668340.05 KB, 170187/281058 clean/dirty pages, 156564.86 /511775.18 /289624.82 clean/dirty/updates KB, 685.30 KB max page, 685.30 KB max dirty page
      0x52b4d1018000:file:WiredTigerSharedHS.wt_stable/WiredTigerCheckpoint.24(<live>):
      0x52b4d1018000:internal: 1 pages, 2.19 KB, 1/0 clean/dirty pages, 2.19/0.00 clean / dirty KB, 2.19 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:leaf: 0 pages
      0x52b4d1018000:file:WiredTigerShared.wt_stable/WiredTigerCheckpoint.25(<live>):
      0x52b4d1018000:internal: 1 pages, 0.53 KB, 1/0 clean/dirty pages, 0.53/0.00 clean / dirty KB, 0.53 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:leaf: 1 pages, 6.45 KB, 1/0 clean/dirty pages, 6.45 /0.00 /0.00 clean/dirty/updates KB, 6.45 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:file:WiredTigerSharedHS.wt_stable(<live>):
      0x52b4d1018000:internal: 94 pages, 7742.52 KB, 88/6 clean/dirty pages, 7517.52/225.00 clean / dirty KB, 322.59 KB max page, 41.34 KB max dirty page
      0x52b4d1018000:leaf: 5469 pages, 293262.97 KB, 3970/1499 clean/dirty pages, 137429.86 /155833.11 /112587.20 clean/dirty/updates KB, 1213.16 KB max page, 1213.16 KB max dirty page
      0x52b4d1018000:file:WiredTigerHS.wt(<live>):
      0x52b4d1018000:internal: 23 pages, 15613.77 KB, 6/17 clean/dirty pages, 2969.89/12643.88 clean / dirty KB, 1523.57 KB max page, 1523.57 KB max dirty page
      0x52b4d1018000:leaf: 659 pages, 43538.96 KB, 237/422 clean/dirty pages, 13402.39 /30136.58 /26781.50 clean/dirty/updates KB, 1963.17 KB max page, 1963.17 KB max dirty page
      0x52b4d1018000:file:WiredTiger.wt(<live>):
      0x52b4d1018000:internal: 1 pages, 0.45 KB, 1/0 clean/dirty pages, 0.45/0.00 clean / dirty KB, 0.45 KB max page, 0.00 KB max dirty page
      0x52b4d1018000:leaf: 1 pages, 16.18 KB, 0/1 clean/dirty pages, 0.00 /16.18 /2.00 clean/dirty/updates KB, 16.18 KB max page, 16.18 KB max dirty page
      0x52b4d1018000:cache dump: total found: 3235.51 MB vs tracked inuse 2802.07 MB
      0x52b4d1018000:total dirty bytes: 1700.05 MB vs tracked dirty 1700.05 MB
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      [1782303423:808569][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: __wt_btcur_next, 795: Error at src/btree/bt_curnext.c:795: "WT_NOTFOUND" failed: WT_NOTFOUND: item not found
      [1782303423:808582][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: __curfile_next, 192: Error at src/cursor/cur_file.c:192: "ret" failed: WT_NOTFOUND: item not found
      [1782303423:808585][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_ERROR_RETURNS][ERROR]: __evict_thread_run, 98: Error at src/evict/evict_thread.c:98: "ret" failed: Connection timed out
      [1782303423:808590][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: __evict_thread_run, 121: eviction thread error: Connection timed out
      [1782303423:808592][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: __evict_thread_run, 121: the process must exit and restart: WT_PANIC: WiredTiger library panic
      [1782303423:808596][36762:0xffff98dbdbc0], t, file:WiredTigerSharedHS.wt_stable, eviction-server: [WT_VERB_DEFAULT][ERROR]: __wt_abort, 29: aborting WiredTiger library
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      #0  0x0000ffff9b6c4454 in __pthread_kill_implementation () from /lib64/libc.so.6
      #0  0x0000ffff9b6c4454 in __pthread_kill_implementation () from /lib64/libc.so.6
      #1  0x0000ffff9b67b320 [PAC] in raise () from /lib64/libc.so.6
      #2  0x0000ffff9b662224 [PAC] in abort () from /lib64/libc.so.6
      #3  0x0000ffff9ba55f84 [PAC] in __wt_abort (session=session@entry=0x52b4d1018000) at /data/mci/37d23a53acaaa44e392d3d56cd7e25d6/wiredtiger/src/os_common/os_abort.c:32
      #4  0x0000ffff9bb02630 in __wt_panic_func (session=session@entry=0x52b4d1018000, error=error@entry=110, func=func@entry=0xffff9bc08e48 <__PRETTY_FUNCTION__.6> "__evict_thread_run", line=line@entry=121, category=category@entry=WT_VERB_DEFAULT, fmt=fmt@entry=0xffff9bba33f0 "eviction thread error") at /data/mci/37d23a53acaaa44e392d3d56cd7e25d6/wiredtiger/src/support/err.c:633
      #5  0x0000ffff9ba0e164 in __evict_thread_run (session=0x52b4d1018000, thread=0x52b4b479cdc0) at /data/mci/37d23a53acaaa44e392d3d56cd7e25d6/wiredtiger/src/evict/evict_thread.c:121
      #6  0x0000ffff9bb1e5d4 in __thread_run (arg=0x52b4b479cdc0) at /data/mci/37d23a53acaaa44e392d3d56cd7e25d6/wiredtiger/src/support/thread_group.c:32
      #7  0x0000ffff9b6c2834 in start_thread () from /lib64/libc.so.6
      #8  0x0000ffff9b666e5c [PAC] in thread_start () from /lib64/libc.so.6
      

      logs

      Repro Artifacts:

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            xgen-buildbaron-user
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: