[Disagg] checkpoint-cleanup thread asserts "schema lock acquired during role transition" in switch mode

XMLWordPrintableJSON

    • Type: Build Failure
    • Resolution: Gone away
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Schema Management

      format-stress-test-disagg-switch-data-validation-1 on amazon2023-disagg-stress

      Host: i-0deab58379c6a5796
      Project: wiredtiger-disagg
      Commit: 6e6b861b
      Please refer to BF(G) Playbook for instructions on handling BF and BFG tickets as well as Auto-Resolution Rules

      Task Logs:

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      Task logger initialized (agent version '2026-05-27' from Evergreen build revision '8b7c6af63f3db2a23124c01a53c0f5690ebdb417').
      Starting task 'wiredtiger_disagg_amazon2023_disagg_stress_format_stress_test_disagg_switch_data_validation_1_6e6b861bd88329820dfea789f5b23743d3d14c91_26_05_28_02_11_09', execution 0.
      Running pre-task commands.
      Running command 'shell.exec' in function 'cleanup' (step 1 of 2) in block 'pre'.
      Finished command 'shell.exec' in function 'cleanup' (step 1 of 2) in block 'pre' in 2.267935ms.
      Running command 'expansions.update' in function 'setup environment' (step 2 of 2) in block 'pre'.
      Finished command 'expansions.update' in function 'setup environment' (step 2 of 2) in block 'pre' in 32.006µs.
      Finished running pre-task commands in 2.493014ms.
      Running task commands.
      Running command 'git.get_project' in function 'get project' (step 1 of 3).
      Manifest loaded successfully.
      Fetching source from git...
      Commands are: set -o xtrace
      chmod -R 755 wiredtiger
      set -o errexit
      rm -rf wiredtiger
      set +o xtrace
      echo "git clone https://x-access-token:<REDACTED:EVERGREEN_GENERATED_GITHUB_TOKEN>@github.com/wiredtiger/wiredtiger.git 'wiredtiger' --branch 'develop'"
      git clone https://x-access-token:<REDACTED:EVERGREEN_GENERATED_GITHUB_TOKEN>@github.com/wiredtiger/wiredtiger.git 'wiredtiger' --branch 'develop'
      set -o xtrace
      cd wiredtiger
      git reset --hard 6e6b861bd88329820dfea789f5b23743d3d14c91
      git log --oneline -n 10
      git clone https://x-access-token:<REDACTED:EVERGREEN_GENERATED_GITHUB_TOKEN>@github.com/wiredtiger/wiredtiger.git 'wiredtiger' --branch 'develop'
      HEAD is now at 6e6b861bd8 WT-17659 Rename numbered layered Python tests (Pt 3: Schema/config) (#13912)
      6e6b861bd8 WT-17659 Rename numbered layered Python tests (Pt 3: Schema/config) (#13912)
      2d43c6ac5f WT-17616 Update comment guidelines in contributing guide and AGENTS.md (#13876)
      1b24ed7f7a WT-17338 Auto-pick up latest checkpoint in disagg follower mode for wt tool (#13856)
      ba2b12e03c WT-15768 Remove disagg block manager infinite loop (#13886)
      7109669afd WT-17455 Set missing time point flags when ingest drain resolves prepared on stable (#13906)
      1c96ddf7de WT-17650 Replace WT_DISAGG_SLOW_TRUNCATE_BUILD with debug_mode runtime knobs (#13899)
      480998248d WT-17656 Assert that we never perform schema ops during step-up/step-down (#13897)
      61c2df0bee WT-17658 Rename numbered layered Python tests (Pt 2: Write/durability) (#13907)
      816bf058a5 WT-17657 Implement layered cursor debug dump (#13901)
      c4a7946d89 WT-17570 Rename numbered layered Python tests (Pt 1: Read/access path) (#13903)
      Finished command 'git.get_project' in function 'get project' (step 1 of 3) in 9.830527574s.
      Running command 'github.generate_token' in function 'compile wiredtiger' (step 2.1 of 3).
      Requesting a GitHub dynamic access token with owner:wiredtiger, repository:automation-scripts, permissions:[Contents:read, Metadata:read]
      Created a GitHub dynamic access token. The token has the following permissions: [Contents:read, Metadata:read]
      Finished command 'github.generate_token' in function 'compile wiredtiger' (step 2.1 of 3) in 168.844146ms.
      Running command 'shell.exec' in function 'compile wiredtiger' (step 2.2 of 3).
      max_attempts=5
      command="git clone https://x-access-token:<REDACTED:generated_token>@github.com/wiredtiger/automation-scripts.git"
      if ! [ -d "./automation-scripts" ]; then
        for attempt in $(seq 1 $max_attempts); do
          echo "Attempt $attempt of $max_attempts cloning automation-scripts'"
          $command
          # Check the exit status of the command
          if [ $? -eq 0 ]; then
            echo "Clone succeeded on attempt $attempt."
            exit 0
          else
            if [ $attempt -eq $max_attempts ]; then
              echo "Clone failed after $max_attempts attempts."
              exit 1
            fi
          fi
          # Delay before reattempting the clone
          sleep 1
      Attempt 1 of 5 cloning automation-scripts'
      Clone succeeded on attempt 1.
        done
      fi
      Cloning into 'automation-scripts'...
      Finished command 'shell.exec' in function 'compile wiredtiger' (step 2.2 of 3) in 332.220302ms.
      Running command 'shell.exec' in function 'compile wiredtiger' (step 2.3 of 3).
      Dump Environment
       eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot "$@"
      BASH_FUNC_which%%=() {  ( alias;
      CI=true
      EVR_AGENT_PID=3664
      EVR_TASK_ID=wiredtiger_disagg_amazon2023_disagg_stress_format_stress_test_disagg_switch_data_validation_1_6e6b861bd88329820dfea789f5b23743d3d14c91_26_05_28_02_11_09
      GOCACHE=/data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/.gocache
      GOTRACEBACK=none
      HISTCONTROL=ignoredups
      HISTSIZE=1000
      HOME=/home/ec2-user
      HOSTNAME=ip-10-122-63-90.ec2.internal
      INVOCATION_ID=d5836ac7e05b4106b171c0cebe19b808
      JASPER_ID=6fe52835-1989-4d9d-8321-e9378670821d
      JASPER_MANAGER=9009fd92-1b5d-4bfd-8ef2-6da4e052525c
      JOURNAL_STREAM=8:10074
      LANG=C.UTF-8
      LC_ALL=C
      LESSOPEN=||/usr/bin/lesspipe.sh %s
      LOGNAME=ec2-user
      MAIL=/var/spool/mail/ec2-user
      PATH=/opt/mongodbtoolchain/v5/bin:/usr/sbin:/usr/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/node/bin:/opt/node/bin:/opt/node/bin
      PWD=/data/mci/515108b045e8db5a64f3525e12009198/wiredtiger
      SHELL=/bin/bash
      SHLVL=1
      SYSTEMD_COLORS=false
      SYSTEMD_EXEC_PID=3400
      S_COLORS=auto
      TEMP=/data/mci/515108b045e8db5a64f3525e12009198/tmp
      TMP=/data/mci/515108b045e8db5a64f3525e12009198/tmp
      TMPDIR=/data/mci/515108b045e8db5a64f3525e12009198/tmp
      USER=ec2-user
      _=/usr/bin/printenv
      s3_access_key=<REDACTED:s3_access_key>
      s3_bucket_tcmalloc=s3://boxes.10gen.com
      s3_secret_key=<REDACTED:s3_secret_key>
      which_declare=declare -f
      }
      Using config flags    -DCMAKE_INSTALL_PREFIX=/data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/cmake_build/LOCAL_INSTALL                                                -DENABLE_COLORIZE_OUTPUT=0
      SWIG version  is earlier than 4.0.0. Installing a newer version 4.2.1 ...
      Collecting swig==4.2.1
        Downloading swig-4.2.1-py2.py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 55.6 MB/s eta 0:00:00
      test/evergreen/ensure_swig_version.sh: line 15: swig: command not found
      WARNING: You are using pip version 22.0.4; however, version 26.1.1 is available.
      You should consider upgrading via the '/data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/venv/bin/python3 -m pip install --upgrade pip' command.
      Installing collected packages: swig
      Successfully installed swig-4.2.1
      SWIG Version 4.2.1
      Compiled with /opt/rh/devtoolset-10/root/usr/bin/c++ [Linux]
      Configured options: +pcre
      Please see https://www.swig.org for reporting bugs and further information
      Attempting to download prebuilt tcmalloc: s3://boxes.10gen.com/build/wt_prebuilt_tcmalloc/mongo-20240522/tcmalloc-mongo-20240522-amazon2023-disagg-stress.tgz
      Find CMake
      ==========================================================
      CMake and CTest environment variables, paths and versions:
      CMAKE: /opt/mongodbtoolchain/v5/bin/cmake
      CTEST: /opt/mongodbtoolchain/v5/bin/ctest
      /opt/mongodbtoolchain/v5/bin/cmake
      /opt/mongodbtoolchain/v5/bin/ctest
      cmake version 3.21.2
      CMake suite maintained and supported by Kitware (kitware.com/cmake).
      ctest version 3.21.2
      CMake suite maintained and supported by Kitware (kitware.com/cmake).
      ==========================================================
      Remove the cmake_build directory, if it already exists
      Create a new cmake_build directory
      Calling CMake with command:
      /opt/mongodbtoolchain/v5/bin/cmake --preset linux-gcc -DCMAKE_INSTALL_PREFIX=/data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/cmake_build/LOCAL_INSTALL -DENABLE_COLORIZE_OUTPUT=0 -G Ninja ./..
      Preset CMake variables:
        CMAKE_CXX_COMPILER="/opt/mongodbtoolchain/v5/bin/g++"
        CMAKE_C_COMPILER="/opt/mongodbtoolchain/v5/bin/gcc"
      Preset environment variables:
        MONGODBTOOLCHAIN_BIN="/opt/mongodbtoolchain/v5/bin"
      -- The C compiler identification is GNU 14.2.0
      -- The CXX compiler identification is GNU 14.2.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /opt/mongodbtoolchain/v5/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /opt/mongodbtoolchain/v5/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Performing Test HAVE_BUILD_ASan_C_FLAGS
      -- Performing Test HAVE_BUILD_ASan_C_FLAGS - Failed
      -- Performing Test HAVE_BUILD_UBSan_C_FLAGS
      -- Performing Test HAVE_BUILD_UBSan_C_FLAGS - Success
      -- Performing Test HAVE_BUILD_UBSan_CXX_FLAGS
      -- Performing Test HAVE_BUILD_UBSan_CXX_FLAGS - Success
      -- Performing Test HAVE_BUILD_TSan_C_FLAGS
      -- Performing Test HAVE_BUILD_TSan_C_FLAGS - Failed
      -- Performing Test HAVE_BUILD_Coverage_C_FLAGS
      -- Performing Test HAVE_BUILD_Coverage_C_FLAGS - Success
      -- Performing Test HAVE_BUILD_Coverage_CXX_FLAGS
      -- Performing Test HAVE_BUILD_Coverage_CXX_FLAGS - Success
      -- Using ccache: /opt/mongodbtoolchain/v5/bin/ccache
      -- Performing Test HAVE_RCPC
      -- Performing Test HAVE_RCPC - Success
      -- Performing Test has_moutline_atomics
      -- Performing Test has_moutline_atomics - Success
      -- Looking for pthread.h
      -- Looking for pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Looking for library lz4
      -- Looking for library lz4 - found
      -- Looking for library snappy
      -- Looking for library snappy - found
      -- Looking for library z
      -- Looking for library z - found
      -- Looking for library zstd
      -- Looking for library zstd - found
      -- Looking for library sodium
      -- Looking for library sodium - not found
      -- Looking for library qpl
      -- Looking for library qpl - not found
      -- Looking for library accel-config
      -- Looking for library accel-config - not found
      -- Looking for library memkind
      -- Looking for library memkind - not found
      -- Looking for library SQLite3
      -- Looking for library SQLite3 - Found inter
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      [1779945239:325244][11466:0xffff9e3edbc0], t, file:T00002.wt_stable, checkpoint-cleanup: [WT_VERB_DEFAULT][ERROR]: __wt_session_get_dhandle, 989: WiredTiger assertion failed: '!(__wt_atomic_load_uint32_relaxed(&((((WT_CONNECTION_IMPL *)((WT_SESSION_IMPL *)(session))->iface.connection))->flags_atomic)) & (uint32_t)(0x10000u | 0x08000u)) || ((((session)->flags) & (0x80000000u | 0x40000000u)) != 0)'. schema lock acquired during role transition
      [1779945239:325273][11466:0xffff9e3edbc0], t, file:T00002.wt_stable, checkpoint-cleanup: [WT_VERB_DEFAULT][ERROR]: __wt_abort, 29: aborting WiredTiger library
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      bash: line 67: 11466 Aborted                 (core dumped) ./t -c ../../../test/format/CONFIG.disagg disagg.mode=switch ops.verify=1 runs.mirror=1 table1.runs.source=table table1.disagg.enabled=0 runs.timer=5:10 ops.prepare=1
       Iteration 5/5
      t: process 11466 running
      ======= FAILURE ==========
      ############################################
      #  RUN PARAMETERS: V3
      ############################################
      assert.read_timestamp=0
      background_compact=0
      background_compact.free_space_target=32
      backup=0
      backup.incremental=off
      backup.incr_granularity=14189
      backup.live_restore=1
      backup.live_restore_read_size=1024
      backup.live_restore_threads=7
      block_cache=0
      block_cache.cache_on_checkpoint=0
      block_cache.cache_on_writes=0
      block_cache.size=58
      cache=3072
      cache.evict_max=1
      cache.eviction_dirty_target=0
      cache.eviction_dirty_trigger=0
      cache.eviction_updates_target=0
      cache.eviction_updates_trigger=0
      cache.minimum=0
      cache.maximum=0
      checkpoint=on
      checkpoint.log_size=120
      checkpoint.wait=10
      checkpoint_threads=2
      compact.free_space_target=85
      debug.background_compact=0
      debug.checkpoint_retention=1
      debug.cursor_reposition=0
      debug.disagg_slow_truncate_follower=1
      debug.eviction=0
      debug.log_retention=7
      debug.realloc_exact=0
      debug.realloc_malloc=0
      debug.slow_checkpoint=0
      debug.slow_truncate=0
      debug.table_logging=0
      debug.update_restore_evict=0
      disagg.internal_page_delta=1
      disagg.leaf_page_delta=1
      disagg.multi=0
      disagg.multi_validation=0
      disagg.enabled=1
      disagg.layered=1
      disagg.mode=switch
      disagg.page_log=palite
      disagg.key_provider=0
      disagg.page_log.verbose=0
      disagg.drain_threads=4
      disagg.preserve=0
      disk.data_extend=0
      disk.encryption=none
      disk.mmap=1
      disk.mmap_all=0
      eviction.evict_use_softptr=1
      file_manager.close_handle_minimum=8
      file_manager.close_idle_time=59
      file_manager.close_scan_interval=6
      format.abort=0
      format.independent_thread_rng=1
      format.major_timeout=0
      import=0
      logging=0
      logging.compression=none
      logging.file_max=354147
      logging.prealloc=0
      logging.remove=1
      obsolete_cleanup.method=off
      obsolete_cleanup.wait=24
      ops.alter=0
      ops.compaction=0
      ops.hs_cursor=0
      ops.bound_cursor=0
      ops.prepare=1
      ops.reserve=10
      ops.random_cursor=0
      ops.salvage=0
      ops.throttle=0
      ops.throttle.sleep_us=387559
      ops.verify=1
      prefetch=0
      prefetch.default=0
      precise_checkpoint=1
      preserve_prepared=1
      quiet=1
      random.data_seed=15298184
      random.extra_seed=7085433
      rollback_to_stable_threads=10
      runs.in_memory=0
      runs.mirror=1
      runs.ops=0
      runs.predictable_replay=0
      runs.source=layered
      runs.tables=3
      runs.threads=21
      runs.timer=5
      runs.type=row-store
      runs.verify_failure_dump=0
      statistics.mode=fast
      statistics_log.sources=off
      stress.aggressive_stash_free=0
      stress.aggressive_sweep=0
      stress.checkpoint=0
      stress.checkpoint_evict_page=0
      stress.checkpoint_prepare=0
      stress.compact_slow=0
      stress.evict_reposition=0
      stress.failpoint_eviction_split=0
      stress.failpoint_hs_delete_key_from_ts=0
      stress.failpoint_rec_before_wrapup=0
      stress.hs_checkpoint_delay=0
      stress.hs_search=0
      stress.hs_sweep=0
      stress.prefetch_delay=0
      stress.prepare_resolution_1=0
      stress.sleep_before_read_overflow_onpage=0
      stress.split_1=0
      stress.split_2=0
      stress.split_3=0
      stress.split_4=0
      stress.split_5=0
      stress.split_6=0
      stress.split_7=0
      stress.split_8=0
      tiered_storage.flush_frequency=0
      tiered_storage.storage_source=off
      transaction.implicit=0
      transaction.operation_timeout_ms=2000
      transaction.timestamps=1
      wiredtiger.config=off
      wiredtiger.rwlock=0
      wiredtiger.leak_memory=0
      ############################################
      #  TABLE PARAMETERS: table 1
      ############################################
      table1.btree.compression=none
      table1.btree.dictionary=1
      table1.btree.internal_key_truncation=1
      table1.btree.internal_page_max=9
      table1.btree.key_max=100
      table1.btree.key_min=25
      table1.btree.leaf_page_max=13
      table1.btree.memory_page_max=1
      table1.btree.prefix_len=0
      table1.btree.prefix_compression=1
      table1.btree.prefix_compression_min=5
      table1.btree.reverse=0
      table1.btree.split_pct=83
      table1.btree.value_max=83
      table1.btree.value_min=17
      table1.disagg.enabled=0
      table1.disk.checksum=on
      table1.disk.firstfit=0
      table1.ops.pareto=0
      table1.ops.pareto.skew=36
      table1.ops.pct.delete=16
      table1.ops.pct.insert=28
      table1.ops.pct.modify=32
      table1.ops.pct.read=0
      table1.ops.pct.write=24
      table1.ops.truncate=1
      table1.runs.rows=652056
      table1.runs.source=table
      ############################################
      #  TABLE PARAMETERS: table 2
      ############################################
      table2.btree.compression=none
      table2.btree.dictionary=0
      table2.btree.internal_key_truncation=1
      table2.btree.internal_page_max=10
      table2.btree.key_max=63
      table2.btree.key_min=26
      table2.btree.leaf_page_max=15
      table2.btree.memory_page_max=3
      table2.btree.prefix_len=0
      table2.btree.prefix_compression=1
      table2.btree.prefix_compression_min=8
      table2.btree.reverse=0
      table2.btree.split_pct=82
      table2.btree.value_max=1446
      table2.btree.value_min=1
      table2.disk.checksum=on
      table2.disk.firstfit=0
      table2.ops.pareto=0
      table2.ops.pareto.skew=92
      table2.ops.pct.delete=5
      table2.ops.pct.insert=75
      table2.ops.pct.modify=2
      table2.ops.pct.read=13
      table2.ops.pct.write=5
      table2.ops.truncate=1
      table2.runs.rows=652056
      ############################################
      #  TABLE PARAMETERS: table 3
      ############################################
      table3.btree.compression=none
      table3.btree.dictionary=0
      table3.btree.internal_key_truncation=1
      table3.btree.internal_page_max=9
      table3.btree.key_max=101
      table3.btree.key_min=31
      table3.btree.leaf_page_max=12
      table3.btree.memory_page_max=7
      table3.btree.prefix_len=0
      table3.btree.prefix_compression=1
      table3.btree.prefix_compression_min=3
      table3.btree.reverse=0
      table3.btree.split_pct=91
      table3.btree.value_max=3403
      table3.btree.value_min=2
      table3.disk.checksum=unencrypted
      table3.disk.firstfit=0
      table3.ops.pareto=0
      table3.ops.pareto.skew=47
      table3.ops.pct.delete=0
      table3.ops.pct.insert=38
      table3.ops.pct.modify=62
      table3.ops.pct.read=0
      table3.ops.pct.write=0
      table3.ops.truncate=1
      table3.runs.rows=652056
      Command 'shell.exec' in function 'format test disagg' (step 3 of 3) failed: shell script encountered problem: exit code 1.
      Finished command 'shell.exec' in function 'format test disagg' (step 3 of 3) in 1h4m23.149885869s.
      Running task commands failed: running command: command failed: shell script encountered problem: exit code 1
      Finished running task commands in 1h5m27.812069423s.
      Task completed - FAILURE.
      Running post-task commands.
      

      logs

      format-stress-test-disagg-switch-data-validation-1 task_log

      Logs:

      #0  0x0000ffffa34c4454 in __pthread_kill_implementation () from /lib64/libc.so.6
      #0  0x0000ffffa34c4454 in __pthread_kill_implementation () from /lib64/libc.so.6
      #1  0x0000ffffa347b320 [PAC] in raise () from /lib64/libc.so.6
      #2  0x0000ffffa3462224 [PAC] in abort () from /lib64/libc.so.6
      #3  0x0000ffffa384dea4 [PAC] in __wt_abort (session=session@entry=0x52b739772000) at /data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/src/os_common/os_abort.c:32
      #4  0x0000ffffa38e0d88 in __wt_session_get_dhandle (session=session@entry=0x52b739772000, uri=0x52b7ac49e000 "file:T00002.wt_stable", checkpoint=checkpoint@entry=0x0, cfg=cfg@entry=0x0, flags=flags@entry=0) at /data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/src/session/session_dhandle.c:989
      #5  0x0000ffffa36fb808 in __checkpoint_cleanup_walk_btree (session=session@entry=0x52b739772000, uri=0x52b74dea8300) at /data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/src/btree/bt_sync_obsolete.c:499
      #6  0x0000ffffa36fbe08 in __checkpoint_cleanup_int (session=session@entry=0x52b739772000) at /data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/src/btree/bt_sync_obsolete.c:765
      #7  0x0000ffffa36fc09c in __checkpoint_cleanup (arg=0x52b739772000) at /data/mci/515108b045e8db5a64f3525e12009198/wiredtiger/src/btree/bt_sync_obsolete.c:849
      #8  0x0000ffffa34c2834 in start_thread () from /lib64/libc.so.6
      #9  0x0000ffffa3466e5c [PAC] in thread_start () from /lib64/libc.so.6
      

      logs

            Assignee:
            Alexander Pullen
            Reporter:
            xgen-buildbaron-user
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: