Disagg: cannot read page after python test copies a directory (test_checkpoint_snapshot01)

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Test Python
    • None
    • Storage Engines, Storage Engines - Persistence
    • SE Persistence backlog
    • None

      Running

      python3 ../test/suite/run.py --hook disagg -v 2 checkpoint_snapshot01 -s 2

      A failure occurs because of this unexpected output:

      [1752169866:908160][4560:0xfffff7ff54c0], test_checkpoint_snapshot01.test_checkpoint_snapshot01.test_checkpoint_snapshot(row_string), file:WiredTigerShared.wt_stable, close_ckpt: [WT_VERB_READ][NOTICE]: retry #0 for page_id 101, lsn 4, checkpoint_id 1, reconciliation_id 0, size 63, checksum 17e9acd4
      [1752169866:918258][4560:0xfffff7ff54c0], test_checkpoint_snapshot01.test_checkpoint_snapshot01.test_checkpoint_snapshot(row_string), file:WiredTigerShared.wt_stable, close_ckpt: [WT_VERB_READ][NOTICE]: retry #1 for page_id 101, lsn 4, checkpoint_id 1, reconciliation_id 0, size 63, checksum 17e9acd4
      

      (lots of retries). The test does some stuff in a connection, then checkpoints, then copies the home directory to RESTART, and opens another connection on RESTART. When the RESTART connection closes, it tries to checkpoint a file that is apparently closed, because it is opening the file and cannot read a page (presumably the root page).

      Here's a stack trace:

      #0  __block_disagg_read_multiple (session=0xaaaaab50aae8, block_disagg=0xaaaaab0b6770, block_meta=0xffffffffb030, page_id=101, lsn=4, checkpoint_id=1, reconciliation_id=0, size=63, checksum=2950897388, 
          results_array=0xffffffffb080, results_count=0xffffffffaf9c) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/block_disagg/block_disagg_read.c:113
      #1  0x0000fffff70a6678 in __wti_block_disagg_read_multiple (bm=0xaaaaab488ed0, session=0xaaaaab50aae8, block_meta=0xffffffffb030, addr=0xffffffffbcd3 "\253\252\252", addr_size=11, buffer_array=0xffffffffb080, 
          buffer_count=0xffffffffaf9c) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/block_disagg/block_disagg_read.c:304
      #2  0x0000fffff7097b8c in __wt_blkcache_read (session=0xaaaaab50aae8, buf=0xffffffffba10, block_meta=0xffffffffba38, addr=0xffffffffbcc8 "\300%\204\201\200\277\344\257\342\376\254\253\252\252", addr_size=11)
          at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/block_cache/block_io.c:130
      #3  0x0000fffff70fe0a4 in __wti_btree_tree_open (session=0xaaaaab50aae8, addr=0xffffffffbcc8 "\300%\204\201\200\277\344\257\342\376\254\253\252\252", addr_size=11)
          at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/btree/bt_handle.c:757
      #4  0x0000fffff70fc1f8 in __wt_btree_open (session=0xaaaaab50aae8, op_cfg=0x0) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/btree/bt_handle.c:157
      #5  0x0000fffff71df800 in __wt_conn_dhandle_open (session=0xaaaaab50aae8, cfg=0x0, flags=0) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/conn/conn_dhandle.c:624
      #6  0x0000fffff73cb3c0 in __wt_session_get_dhandle (session=0xaaaaab50aae8, uri=0xfffff7485520 "file:WiredTigerShared.wt_stable", checkpoint=0x0, cfg=0x0, flags=0)
          at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/session/session_dhandle.c:944
      #7  0x0000fffff73cb378 in __wt_session_get_dhandle (session=0xaaaaab50aae8, uri=0xfffff7485520 "file:WiredTigerShared.wt_stable", checkpoint=0x0, cfg=0x0, flags=0)
          at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/session/session_dhandle.c:937
      #8  0x0000fffff71aa4d0 in __checkpoint_db_internal (session=0xaaaaab50aae8, cfg=0xffffffffd640) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/checkpoint/checkpoint_txn.c:1534
      #9  0x0000fffff71ab0a0 in __checkpoint_db_wrapper (session=0xaaaaab50aae8, cfg=0xffffffffd640) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/checkpoint/checkpoint_txn.c:1709
      #10 0x0000fffff71ab378 in __wt_checkpoint_db (session=0xaaaaab50aae8, cfg=0xffffffffd640, waiting=true) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/checkpoint/checkpoint_txn.c:1788
      #11 0x0000fffff74310c8 in __wt_txn_global_shutdown (session=0xaaaaab505460, cfg=0xffffffffd790) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/txn/txn.c:2688
      #12 0x0000fffff71c8d24 in __conn_close (wt_conn=0xaaaaab4e3c40, config=0x0) at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/src/conn/conn_api.c:1241
      #13 0x0000fffff75fa794 in _wrap_Connection_close (self=0xfffff7791080, args=0xfffff6718250)
          at /home/dda/wt/git/wt-14782-python-disagg-triage-checkpoint/build/lang/python/CMakeFiles/wiredtiger_python.dir/wiredtigerPYTHON_wrap.c:7915

              Assignee:
              [DO NOT USE] Backlog - Storage Engines Team
              Reporter:
              Donald Anderson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: