Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5183

test/format rebalance failed when cache got stuck

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 2
    • Storage Engines 2019-12-02

      We had a similar test/format rebalance failure addressed in WT-4329 (closed w/ cannot repro). The rebalance failure came back in a sanitizer job run on tinderbox

      http://build.wiredtiger.com:8080/job/wiredtiger-test-race-condition-stress-sanitizer/33667/

      t: FAILED: wts_rebalance/65: system(cmd): command failed: ../../wt -h RUNDIR dump -f RUNDIR/rebalance.new table:wt: Unknown error 134
      process aborting
      /tmp/jenkins1231660298258164626.sh: line 34: 16841 Aborted                 (core dumped) nice ./t -1 -c CONFIG 

      Cache dump:

      ++ nice ./t -1 -c CONFIG
      [1570778015:897912][16841:0x7f54c15ec700], t, WT_SESSION.compact: __compact_worker, 300: compaction halted by eviction pressure: Device or resource busy
      [1570778097:870662][16841:0x7f54cbfae700], t, WT_SESSION.compact: __compact_worker, 300: compaction halted by eviction pressure: Device or resource busy
      [1570778170:105682][16841:0x7f54c5dea700], t, WT_SESSION.compact: __compact_worker, 300: compaction halted by eviction pressure: Device or resource busy
      [1570778556:403010][18068:0x7f14251f6700], wt, eviction-server: __evict_server, 446: Cache stuck for too long, giving up: Connection timed out
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      transaction state dump
      current ID: 3
      last running ID: 3
      metadata_pinned ID: 3
      oldest ID: 3
      durable timestamp: (0,0)
      oldest timestamp: (0,0)
      pinned timestamp: (0,0)
      stable timestamp: (0,0)
      has_durable_timestamp: no
      has_oldest_timestamp: no
      has_pinned_timestamp: no
      has_stable_timestamp: no
      oldest_is_pinned: no
      stable_is_pinned: no
      checkpoint running: no
      checkpoint generation: 1
      checkpoint pinned ID: 0
      checkpoint txn ID: 0
      oldest named snapshot ID: 0
      session count: 14
      Transaction state of active sessions:
      =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      cache dump
      cache full: yes
      cache clean check: yes (121.612%)
      cache dirty check: no (0.000%)
      file:wt.wt(<live>):
      internal: 1 pages, 58MB, 1/0 clean/dirty pages, 58/0 clean/dirty MB, 58MB max page, 0MB max dirty page
      leaf: 0 pages
      file:WiredTigerLAS.wt(<live>) eviction disabled at open:
      internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 0 pages
      file:WiredTiger.wt(<live>):
      internal: 1 pages, 0MB, 0/1 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      leaf: 0 pages
      cache dump: total found: 63MB vs tracked inuse 63MB
      total dirty bytes: 0MB
      [1570778556:403959][18068:0x7f14251f6700], wt, eviction-server: __wt_evict_thread_run, 321: cache eviction thread error: Connection timed out
      [1570778556:403985][18068:0x7f14251f6700], wt, eviction-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
      [1570778556:404007][18068:0x7f14251f6700], wt, eviction-server: __wt_abort, 28: aborting WiredTiger library
      t: process 16841 

      The configuration:

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      alter=0
      assert_commit_timestamp=0
      assert_read_timestamp=0
      auto_throttle=1
      backups=1
      bitcnt=2
      bloom=1
      bloom_bit_count=58
      bloom_hash_count=31
      bloom_oldest=0
      cache=52
      cache_minimum=20
      checkpoints=on
      checkpoint_log_size=181
      checkpoint_wait=61
      checksum=on
      chunk_size=8
      compaction=1
      compression=snappy
      data_extend=0
      data_source=table
      delete_pct=6
      dictionary=0
      direct_io=0
      encryption=none
      evict_max=4
      file_type=row-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      independent_thread_rng=1
      in_memory=0
      insert_pct=85
      internal_key_truncation=1
      internal_page_max=16
      isolation=snapshot
      key_gap=11
      key_max=58
      key_min=13
      leaf_page_max=14
      leak_memory=0
      logging=0
      logging_archive=1
      logging_compression=none
      logging_file_max=22386
      logging_prealloc=1
      long_running_txn=0
      lsm_worker_threads=4
      memory_page_max=2
      merge_max=16
      mmap=1
      modify_pct=3
      ops=0
      prefix_compression=1
      prefix_compression_min=7
      prepare=0
      quiet=1
      random_cursor=0
      read_pct=6
      rebalance=1
      repeat_data_pct=88
      reverse=1
      rows=1000000
      runs=1
      salvage=1
      split_pct=58
      statistics=0
      statistics_server=0
      threads=13
      timer=4
      timing_stress_aggressive_sweep=0
      timing_stress_checkpoint=0
      timing_stress_lookaside_sweep=0
      timing_stress_split_1=1
      timing_stress_split_2=0
      timing_stress_split_3=0
      timing_stress_split_4=1
      timing_stress_split_5=1
      timing_stress_split_6=0
      timing_stress_split_7=1
      timing_stress_split_8=0
      transaction_timestamps=1
      transaction-frequency=100
      truncate=1
      value_max=501
      value_min=16
      verify=1
      wiredtiger_config=
      write_pct=0
      ############################################ 

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            luke.chen@mongodb.com Luke Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: