-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
8
-
Storage Engines 2020-02-10, Storage Engines 2020-02-24
Summary:
Change reconciliation to detect and fail evictions that will eventually fail because of a running checkpoint.
Change format to not configure WiredTiger library threads to perform checkpoints during bulk load, it leads to cache stuck failures we're not interested in fixing.
The direct I/O Evergreen task recently captured a few cache stuck dirty cases, the size of the dirty leaf pages seem large (2-digits MB), after the timeout issue being resolved by WT-5458.
The cache dump of the last failure:
[2020/02/03 18:07:14.544] t: process 49959 running [2020/02/03 18:07:14.544] [1580713634:451499][49959:0x7f51904f5700], t, eviction-server: __evict_server, 440: Cache stuck for too long, giving up: Connection timed out [2020/02/03 18:07:14.544] 0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [2020/02/03 18:07:14.544] 0x7f5192866690:transaction state dump [2020/02/03 18:07:14.544] 0x7f5192866690:current ID: 60 [2020/02/03 18:07:14.544] 0x7f5192866690:last running ID: 60 [2020/02/03 18:07:14.544] 0x7f5192866690:metadata_pinned ID: 59 [2020/02/03 18:07:14.544] 0x7f5192866690:oldest ID: 60 [2020/02/03 18:07:14.544] 0x7f5192866690:durable timestamp: (0, 100) [2020/02/03 18:07:14.544] 0x7f5192866690:oldest timestamp: (0, 98) [2020/02/03 18:07:14.544] 0x7f5192866690:pinned timestamp: (0, 98) [2020/02/03 18:07:14.544] 0x7f5192866690:stable timestamp: (0, 0) [2020/02/03 18:07:14.544] 0x7f5192866690:has_durable_timestamp: yes [2020/02/03 18:07:14.544] 0x7f5192866690:has_oldest_timestamp: yes [2020/02/03 18:07:14.544] 0x7f5192866690:has_pinned_timestamp: yes [2020/02/03 18:07:14.544] 0x7f5192866690:has_stable_timestamp: no [2020/02/03 18:07:14.544] 0x7f5192866690:oldest_is_pinned: yes [2020/02/03 18:07:14.544] 0x7f5192866690:stable_is_pinned: no [2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint running: yes [2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint generation: 6 [2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint pinned ID: 59 [2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint txn ID: 59 [2020/02/03 18:07:14.544] 0x7f5192866690:oldest named snapshot ID: 0 [2020/02/03 18:07:14.544] 0x7f5192866690:session count: 16 [2020/02/03 18:07:14.544] 0x7f5192866690:Transaction state of active sessions: [2020/02/03 18:07:14.544] 0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [2020/02/03 18:07:14.544] 0x7f5192866690:cache dump [2020/02/03 18:07:14.544] 0x7f5192866690:cache full: no [2020/02/03 18:07:14.544] 0x7f5192866690:cache clean check: no (36.764%) [2020/02/03 18:07:14.544] 0x7f5192866690:cache dirty check: yes (26.423%) [2020/02/03 18:07:14.544] 0x7f5192866690:file:wt.wt(<live>): [2020/02/03 18:07:14.544] 0x7f5192866690:internal: 146 pages, 5MB, 144/2 clean/dirty pages, 5/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 6557 pages, 79MB, 6556/1 clean/dirty pages, 18/61 clean/dirty MB, 61MB max page, 61MB max dirty page [2020/02/03 18:07:14.544] 0x7f5192866690:file:WiredTigerLAS.wt(<live>) eviction disabled at open: [2020/02/03 18:07:14.544] 0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 0 pages [2020/02/03 18:07:14.544] 0x7f5192866690:file:WiredTiger.wt(<live>): [2020/02/03 18:07:14.544] 0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page [2020/02/03 18:07:14.544] 0x7f5192866690:cache dump: total found: 92MB vs tracked inuse 92MB [2020/02/03 18:07:14.544] 0x7f5192866690:total dirty bytes: 62MB [2020/02/03 18:07:14.544] [1580713634:455422][49959:0x7f51904f5700], t, eviction-server: __wt_evict_thread_run, 315: cache eviction thread error: Connection timed out [2020/02/03 18:07:14.544] [1580713634:455441][49959:0x7f51904f5700], t, eviction-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic [2020/02/03 18:07:14.544] [1580713634:455450][49959:0x7f51904f5700], t, eviction-server: __wt_abort, 28: aborting WiredTiger library [2020/02/03 18:07:14.544] format.sh: /data/mci/0ddb079853f71bbf59cb2fc376e99471/wiredtiger/test/format/RUNDIR.34/CONFIG:
The configuration:
[2020/02/03 18:07:14.545] ############################################ [2020/02/03 18:07:14.545] # RUN PARAMETERS [2020/02/03 18:07:14.545] ############################################ [2020/02/03 18:07:14.545] abort=0 [2020/02/03 18:07:14.545] alter=0 [2020/02/03 18:07:14.545] assert_commit_timestamp=0 [2020/02/03 18:07:14.545] assert_read_timestamp=0 [2020/02/03 18:07:14.545] auto_throttle=1 [2020/02/03 18:07:14.545] backups=0 [2020/02/03 18:07:14.545] bitcnt=8 [2020/02/03 18:07:14.545] bloom=1 [2020/02/03 18:07:14.545] bloom_bit_count=26 [2020/02/03 18:07:14.545] bloom_hash_count=17 [2020/02/03 18:07:14.545] bloom_oldest=0 [2020/02/03 18:07:14.545] cache=252 [2020/02/03 18:07:14.545] cache_minimum=20 [2020/02/03 18:07:14.545] checkpoints=wiredtiger [2020/02/03 18:07:14.545] checkpoint_log_size=159 [2020/02/03 18:07:14.545] checkpoint_wait=20 [2020/02/03 18:07:14.545] checksum=uncompressed [2020/02/03 18:07:14.545] chunk_size=1 [2020/02/03 18:07:14.545] compaction=0 [2020/02/03 18:07:14.545] compression=none [2020/02/03 18:07:14.545] data_extend=0 [2020/02/03 18:07:14.545] data_source=table [2020/02/03 18:07:14.545] delete_pct=15 [2020/02/03 18:07:14.545] dictionary=1 [2020/02/03 18:07:14.545] direct_io=1 [2020/02/03 18:07:14.545] encryption=rotn-7 [2020/02/03 18:07:14.545] evict_max=5 [2020/02/03 18:07:14.545] file_type=row-store [2020/02/03 18:07:14.545] firstfit=0 [2020/02/03 18:07:14.545] huffman_key=0 [2020/02/03 18:07:14.545] huffman_value=0 [2020/02/03 18:07:14.545] independent_thread_rng=1 [2020/02/03 18:07:14.545] in_memory=0 [2020/02/03 18:07:14.545] insert_pct=52 [2020/02/03 18:07:14.545] internal_key_truncation=1 [2020/02/03 18:07:14.545] internal_page_max=9 [2020/02/03 18:07:14.545] isolation=snapshot [2020/02/03 18:07:14.545] key_gap=1 [2020/02/03 18:07:14.545] key_max=27 [2020/02/03 18:07:14.545] key_min=25 [2020/02/03 18:07:14.545] leaf_page_max=12 [2020/02/03 18:07:14.545] leak_memory=0 [2020/02/03 18:07:14.545] logging=0 [2020/02/03 18:07:14.545] logging_archive=1 [2020/02/03 18:07:14.545] logging_compression=none [2020/02/03 18:07:14.545] logging_file_max=283583 [2020/02/03 18:07:14.545] logging_prealloc=0 [2020/02/03 18:07:14.545] lsm_worker_threads=3 [2020/02/03 18:07:14.545] major_timeout=0 [2020/02/03 18:07:14.545] memory_page_max=6 [2020/02/03 18:07:14.545] merge_max=4 [2020/02/03 18:07:14.545] mmap=1 [2020/02/03 18:07:14.545] modify_pct=2 [2020/02/03 18:07:14.545] ops=0 [2020/02/03 18:07:14.545] prefix_compression=0 [2020/02/03 18:07:14.545] prefix_compression_min=7 [2020/02/03 18:07:14.545] prepare=0 [2020/02/03 18:07:14.545] quiet=1 [2020/02/03 18:07:14.545] random_cursor=0 [2020/02/03 18:07:14.546] read_pct=26 [2020/02/03 18:07:14.546] rebalance=0 [2020/02/03 18:07:14.546] repeat_data_pct=34 [2020/02/03 18:07:14.546] reverse=1 [2020/02/03 18:07:14.546] rows=1000000 [2020/02/03 18:07:14.546] runs=1 [2020/02/03 18:07:14.546] salvage=0 [2020/02/03 18:07:14.546] split_pct=57 [2020/02/03 18:07:14.546] statistics=0 [2020/02/03 18:07:14.546] statistics_server=0 [2020/02/03 18:07:14.546] threads=21 [2020/02/03 18:07:14.546] timer=4 [2020/02/03 18:07:14.546] timing_stress_aggressive_sweep=0 [2020/02/03 18:07:14.546] timing_stress_checkpoint=0 [2020/02/03 18:07:14.546] timing_stress_lookaside_sweep=0 [2020/02/03 18:07:14.546] timing_stress_split_1=0 [2020/02/03 18:07:14.546] timing_stress_split_2=0 [2020/02/03 18:07:14.546] timing_stress_split_3=0 [2020/02/03 18:07:14.546] timing_stress_split_4=0 [2020/02/03 18:07:14.546] timing_stress_split_5=0 [2020/02/03 18:07:14.546] timing_stress_split_6=0 [2020/02/03 18:07:14.546] timing_stress_split_7=0 [2020/02/03 18:07:14.546] timing_stress_split_8=0 [2020/02/03 18:07:14.546] transaction_timestamps=1 [2020/02/03 18:07:14.546] transaction-frequency=100 [2020/02/03 18:07:14.546] truncate=1 [2020/02/03 18:07:14.546] value_max=2047 [2020/02/03 18:07:14.546] value_min=9 [2020/02/03 18:07:14.546] verify=1 [2020/02/03 18:07:14.546] wiredtiger_config= [2020/02/03 18:07:14.546] write_pct=5 [2020/02/03 18:07:14.546] ############################################
- is duplicated by
-
WT-5544 Fix cache stuck during salvage
- Closed