Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5521

Cache stuck during format initial load, configured with library checkpoints

    • 8
    • Storage Engines 2020-02-10, Storage Engines 2020-02-24

      Summary:
      Change reconciliation to detect and fail evictions that will eventually fail because of a running checkpoint.

      Change format to not configure WiredTiger library threads to perform checkpoints during bulk load, it leads to cache stuck failures we're not interested in fixing.


      The direct I/O Evergreen task recently captured a few cache stuck dirty cases, the size of the dirty leaf pages seem large (2-digits MB), after the timeout issue being resolved by WT-5458

      https://evergreen.mongodb.com/task/wiredtiger_ubuntu1804_linux_directio_40b746e33589e3660cb62ad283d7dae4ccd5c624_20_01_29_15_32_01

      https://evergreen.mongodb.com/task/wiredtiger_rhel80_linux_directio_168c7f1b3859516115721868f312544abb0caf5b_20_01_30_16_00_12

      https://evergreen.mongodb.com/task/wiredtiger_ubuntu1804_linux_directio_8183f4cd2264d7d5428a89067bbfe000a285e237_20_02_03_05_08_17

      The cache dump of the last failure: 

      [2020/02/03 18:07:14.544]     t: process 49959 running
      [2020/02/03 18:07:14.544]     [1580713634:451499][49959:0x7f51904f5700], t, eviction-server: __evict_server, 440: Cache stuck for too long, giving up: Connection timed out
      [2020/02/03 18:07:14.544]     0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      [2020/02/03 18:07:14.544]     0x7f5192866690:transaction state dump
      [2020/02/03 18:07:14.544]     0x7f5192866690:current ID: 60
      [2020/02/03 18:07:14.544]     0x7f5192866690:last running ID: 60
      [2020/02/03 18:07:14.544]     0x7f5192866690:metadata_pinned ID: 59
      [2020/02/03 18:07:14.544]     0x7f5192866690:oldest ID: 60
      [2020/02/03 18:07:14.544]     0x7f5192866690:durable timestamp: (0, 100)
      [2020/02/03 18:07:14.544]     0x7f5192866690:oldest timestamp: (0, 98)
      [2020/02/03 18:07:14.544]     0x7f5192866690:pinned timestamp: (0, 98)
      [2020/02/03 18:07:14.544]     0x7f5192866690:stable timestamp: (0, 0)
      [2020/02/03 18:07:14.544]     0x7f5192866690:has_durable_timestamp: yes
      [2020/02/03 18:07:14.544]     0x7f5192866690:has_oldest_timestamp: yes
      [2020/02/03 18:07:14.544]     0x7f5192866690:has_pinned_timestamp: yes
      [2020/02/03 18:07:14.544]     0x7f5192866690:has_stable_timestamp: no
      [2020/02/03 18:07:14.544]     0x7f5192866690:oldest_is_pinned: yes
      [2020/02/03 18:07:14.544]     0x7f5192866690:stable_is_pinned: no
      [2020/02/03 18:07:14.544]     0x7f5192866690:checkpoint running: yes
      [2020/02/03 18:07:14.544]     0x7f5192866690:checkpoint generation: 6
      [2020/02/03 18:07:14.544]     0x7f5192866690:checkpoint pinned ID: 59
      [2020/02/03 18:07:14.544]     0x7f5192866690:checkpoint txn ID: 59
      [2020/02/03 18:07:14.544]     0x7f5192866690:oldest named snapshot ID: 0
      [2020/02/03 18:07:14.544]     0x7f5192866690:session count: 16
      [2020/02/03 18:07:14.544]     0x7f5192866690:Transaction state of active sessions:
      [2020/02/03 18:07:14.544]     0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      [2020/02/03 18:07:14.544]     0x7f5192866690:cache dump
      [2020/02/03 18:07:14.544]     0x7f5192866690:cache full: no
      [2020/02/03 18:07:14.544]     0x7f5192866690:cache clean check: no (36.764%)
      [2020/02/03 18:07:14.544]     0x7f5192866690:cache dirty check: yes (26.423%)
      [2020/02/03 18:07:14.544]     0x7f5192866690:file:wt.wt(<live>):
      [2020/02/03 18:07:14.544]     0x7f5192866690:internal: 146 pages, 5MB, 144/2 clean/dirty pages, 5/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [2020/02/03 18:07:14.544]     0x7f5192866690:leaf: 6557 pages, 79MB, 6556/1 clean/dirty pages, 18/61 clean/dirty MB, 61MB max page, 61MB max dirty page
      [2020/02/03 18:07:14.544]     0x7f5192866690:file:WiredTigerLAS.wt(<live>) eviction disabled at open:
      [2020/02/03 18:07:14.544]     0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [2020/02/03 18:07:14.544]     0x7f5192866690:leaf: 0 pages
      [2020/02/03 18:07:14.544]     0x7f5192866690:file:WiredTiger.wt(<live>):
      [2020/02/03 18:07:14.544]     0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [2020/02/03 18:07:14.544]     0x7f5192866690:leaf: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
      [2020/02/03 18:07:14.544]     0x7f5192866690:cache dump: total found: 92MB vs tracked inuse 92MB
      [2020/02/03 18:07:14.544]     0x7f5192866690:total dirty bytes: 62MB
      [2020/02/03 18:07:14.544]     [1580713634:455422][49959:0x7f51904f5700], t, eviction-server: __wt_evict_thread_run, 315: cache eviction thread error: Connection timed out
      [2020/02/03 18:07:14.544]     [1580713634:455441][49959:0x7f51904f5700], t, eviction-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
      [2020/02/03 18:07:14.544]     [1580713634:455450][49959:0x7f51904f5700], t, eviction-server: __wt_abort, 28: aborting WiredTiger library
      [2020/02/03 18:07:14.544] format.sh: /data/mci/0ddb079853f71bbf59cb2fc376e99471/wiredtiger/test/format/RUNDIR.34/CONFIG: 

      The configuration:

      [2020/02/03 18:07:14.545]     ############################################
      [2020/02/03 18:07:14.545]     #  RUN PARAMETERS
      [2020/02/03 18:07:14.545]     ############################################
      [2020/02/03 18:07:14.545]     abort=0
      [2020/02/03 18:07:14.545]     alter=0
      [2020/02/03 18:07:14.545]     assert_commit_timestamp=0
      [2020/02/03 18:07:14.545]     assert_read_timestamp=0
      [2020/02/03 18:07:14.545]     auto_throttle=1
      [2020/02/03 18:07:14.545]     backups=0
      [2020/02/03 18:07:14.545]     bitcnt=8
      [2020/02/03 18:07:14.545]     bloom=1
      [2020/02/03 18:07:14.545]     bloom_bit_count=26
      [2020/02/03 18:07:14.545]     bloom_hash_count=17
      [2020/02/03 18:07:14.545]     bloom_oldest=0
      [2020/02/03 18:07:14.545]     cache=252
      [2020/02/03 18:07:14.545]     cache_minimum=20
      [2020/02/03 18:07:14.545]     checkpoints=wiredtiger
      [2020/02/03 18:07:14.545]     checkpoint_log_size=159
      [2020/02/03 18:07:14.545]     checkpoint_wait=20
      [2020/02/03 18:07:14.545]     checksum=uncompressed
      [2020/02/03 18:07:14.545]     chunk_size=1
      [2020/02/03 18:07:14.545]     compaction=0
      [2020/02/03 18:07:14.545]     compression=none
      [2020/02/03 18:07:14.545]     data_extend=0
      [2020/02/03 18:07:14.545]     data_source=table
      [2020/02/03 18:07:14.545]     delete_pct=15
      [2020/02/03 18:07:14.545]     dictionary=1
      [2020/02/03 18:07:14.545]     direct_io=1
      [2020/02/03 18:07:14.545]     encryption=rotn-7
      [2020/02/03 18:07:14.545]     evict_max=5
      [2020/02/03 18:07:14.545]     file_type=row-store
      [2020/02/03 18:07:14.545]     firstfit=0
      [2020/02/03 18:07:14.545]     huffman_key=0
      [2020/02/03 18:07:14.545]     huffman_value=0
      [2020/02/03 18:07:14.545]     independent_thread_rng=1
      [2020/02/03 18:07:14.545]     in_memory=0
      [2020/02/03 18:07:14.545]     insert_pct=52
      [2020/02/03 18:07:14.545]     internal_key_truncation=1
      [2020/02/03 18:07:14.545]     internal_page_max=9
      [2020/02/03 18:07:14.545]     isolation=snapshot
      [2020/02/03 18:07:14.545]     key_gap=1
      [2020/02/03 18:07:14.545]     key_max=27
      [2020/02/03 18:07:14.545]     key_min=25
      [2020/02/03 18:07:14.545]     leaf_page_max=12
      [2020/02/03 18:07:14.545]     leak_memory=0
      [2020/02/03 18:07:14.545]     logging=0
      [2020/02/03 18:07:14.545]     logging_archive=1
      [2020/02/03 18:07:14.545]     logging_compression=none
      [2020/02/03 18:07:14.545]     logging_file_max=283583
      [2020/02/03 18:07:14.545]     logging_prealloc=0
      [2020/02/03 18:07:14.545]     lsm_worker_threads=3
      [2020/02/03 18:07:14.545]     major_timeout=0
      [2020/02/03 18:07:14.545]     memory_page_max=6
      [2020/02/03 18:07:14.545]     merge_max=4
      [2020/02/03 18:07:14.545]     mmap=1
      [2020/02/03 18:07:14.545]     modify_pct=2
      [2020/02/03 18:07:14.545]     ops=0
      [2020/02/03 18:07:14.545]     prefix_compression=0
      [2020/02/03 18:07:14.545]     prefix_compression_min=7
      [2020/02/03 18:07:14.545]     prepare=0
      [2020/02/03 18:07:14.545]     quiet=1
      [2020/02/03 18:07:14.545]     random_cursor=0
      [2020/02/03 18:07:14.546]     read_pct=26
      [2020/02/03 18:07:14.546]     rebalance=0
      [2020/02/03 18:07:14.546]     repeat_data_pct=34
      [2020/02/03 18:07:14.546]     reverse=1
      [2020/02/03 18:07:14.546]     rows=1000000
      [2020/02/03 18:07:14.546]     runs=1
      [2020/02/03 18:07:14.546]     salvage=0
      [2020/02/03 18:07:14.546]     split_pct=57
      [2020/02/03 18:07:14.546]     statistics=0
      [2020/02/03 18:07:14.546]     statistics_server=0
      [2020/02/03 18:07:14.546]     threads=21
      [2020/02/03 18:07:14.546]     timer=4
      [2020/02/03 18:07:14.546]     timing_stress_aggressive_sweep=0
      [2020/02/03 18:07:14.546]     timing_stress_checkpoint=0
      [2020/02/03 18:07:14.546]     timing_stress_lookaside_sweep=0
      [2020/02/03 18:07:14.546]     timing_stress_split_1=0
      [2020/02/03 18:07:14.546]     timing_stress_split_2=0
      [2020/02/03 18:07:14.546]     timing_stress_split_3=0
      [2020/02/03 18:07:14.546]     timing_stress_split_4=0
      [2020/02/03 18:07:14.546]     timing_stress_split_5=0
      [2020/02/03 18:07:14.546]     timing_stress_split_6=0
      [2020/02/03 18:07:14.546]     timing_stress_split_7=0
      [2020/02/03 18:07:14.546]     timing_stress_split_8=0
      [2020/02/03 18:07:14.546]     transaction_timestamps=1
      [2020/02/03 18:07:14.546]     transaction-frequency=100
      [2020/02/03 18:07:14.546]     truncate=1
      [2020/02/03 18:07:14.546]     value_max=2047
      [2020/02/03 18:07:14.546]     value_min=9
      [2020/02/03 18:07:14.546]     verify=1
      [2020/02/03 18:07:14.546]     wiredtiger_config=
      [2020/02/03 18:07:14.546]     write_pct=5
      [2020/02/03 18:07:14.546]     ############################################ 

        1. image-2020-02-05-10-28-04-563.png
          12 kB
          Luke Chen
        2. Screen Shot 2020-02-06 at 10.57.19 am.png
          368 kB
          Tammy Bailey

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            luke.chen@mongodb.com Luke Chen
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: