Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
None
-
None
-
8
-
Storage Engines 2020-02-10, Storage Engines 2020-02-24
Description
Summary:
Change reconciliation to detect and fail evictions that will eventually fail because of a running checkpoint.
Change format to not configure WiredTiger library threads to perform checkpoints during bulk load, it leads to cache stuck failures we're not interested in fixing.
The direct I/O Evergreen task recently captured a few cache stuck dirty cases, the size of the dirty leaf pages seem large (2-digits MB), after the timeout issue being resolved by WT-5458.
The cache dump of the last failure:
[2020/02/03 18:07:14.544] t: process 49959 running
|
[2020/02/03 18:07:14.544] [1580713634:451499][49959:0x7f51904f5700], t, eviction-server: __evict_server, 440: Cache stuck for too long, giving up: Connection timed out
|
[2020/02/03 18:07:14.544] 0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
|
[2020/02/03 18:07:14.544] 0x7f5192866690:transaction state dump
|
[2020/02/03 18:07:14.544] 0x7f5192866690:current ID: 60
|
[2020/02/03 18:07:14.544] 0x7f5192866690:last running ID: 60
|
[2020/02/03 18:07:14.544] 0x7f5192866690:metadata_pinned ID: 59
|
[2020/02/03 18:07:14.544] 0x7f5192866690:oldest ID: 60
|
[2020/02/03 18:07:14.544] 0x7f5192866690:durable timestamp: (0, 100)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:oldest timestamp: (0, 98)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:pinned timestamp: (0, 98)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:stable timestamp: (0, 0)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:has_durable_timestamp: yes
|
[2020/02/03 18:07:14.544] 0x7f5192866690:has_oldest_timestamp: yes
|
[2020/02/03 18:07:14.544] 0x7f5192866690:has_pinned_timestamp: yes
|
[2020/02/03 18:07:14.544] 0x7f5192866690:has_stable_timestamp: no
|
[2020/02/03 18:07:14.544] 0x7f5192866690:oldest_is_pinned: yes
|
[2020/02/03 18:07:14.544] 0x7f5192866690:stable_is_pinned: no
|
[2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint running: yes
|
[2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint generation: 6
|
[2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint pinned ID: 59
|
[2020/02/03 18:07:14.544] 0x7f5192866690:checkpoint txn ID: 59
|
[2020/02/03 18:07:14.544] 0x7f5192866690:oldest named snapshot ID: 0
|
[2020/02/03 18:07:14.544] 0x7f5192866690:session count: 16
|
[2020/02/03 18:07:14.544] 0x7f5192866690:Transaction state of active sessions:
|
[2020/02/03 18:07:14.544] 0x7f5192866690:=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
|
[2020/02/03 18:07:14.544] 0x7f5192866690:cache dump
|
[2020/02/03 18:07:14.544] 0x7f5192866690:cache full: no
|
[2020/02/03 18:07:14.544] 0x7f5192866690:cache clean check: no (36.764%)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:cache dirty check: yes (26.423%)
|
[2020/02/03 18:07:14.544] 0x7f5192866690:file:wt.wt(<live>):
|
[2020/02/03 18:07:14.544] 0x7f5192866690:internal: 146 pages, 5MB, 144/2 clean/dirty pages, 5/0 clean/dirty MB, 0MB max page, 0MB max dirty page
|
[2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 6557 pages, 79MB, 6556/1 clean/dirty pages, 18/61 clean/dirty MB, 61MB max page, 61MB max dirty page
|
[2020/02/03 18:07:14.544] 0x7f5192866690:file:WiredTigerLAS.wt(<live>) eviction disabled at open:
|
[2020/02/03 18:07:14.544] 0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
|
[2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 0 pages
|
[2020/02/03 18:07:14.544] 0x7f5192866690:file:WiredTiger.wt(<live>):
|
[2020/02/03 18:07:14.544] 0x7f5192866690:internal: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
|
[2020/02/03 18:07:14.544] 0x7f5192866690:leaf: 1 pages, 0MB, 1/0 clean/dirty pages, 0/0 clean/dirty MB, 0MB max page, 0MB max dirty page
|
[2020/02/03 18:07:14.544] 0x7f5192866690:cache dump: total found: 92MB vs tracked inuse 92MB
|
[2020/02/03 18:07:14.544] 0x7f5192866690:total dirty bytes: 62MB
|
[2020/02/03 18:07:14.544] [1580713634:455422][49959:0x7f51904f5700], t, eviction-server: __wt_evict_thread_run, 315: cache eviction thread error: Connection timed out
|
[2020/02/03 18:07:14.544] [1580713634:455441][49959:0x7f51904f5700], t, eviction-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
|
[2020/02/03 18:07:14.544] [1580713634:455450][49959:0x7f51904f5700], t, eviction-server: __wt_abort, 28: aborting WiredTiger library
|
[2020/02/03 18:07:14.544] format.sh: /data/mci/0ddb079853f71bbf59cb2fc376e99471/wiredtiger/test/format/RUNDIR.34/CONFIG:
|
The configuration:
[2020/02/03 18:07:14.545] ############################################
|
[2020/02/03 18:07:14.545] # RUN PARAMETERS
|
[2020/02/03 18:07:14.545] ############################################
|
[2020/02/03 18:07:14.545] abort=0
|
[2020/02/03 18:07:14.545] alter=0
|
[2020/02/03 18:07:14.545] assert_commit_timestamp=0
|
[2020/02/03 18:07:14.545] assert_read_timestamp=0
|
[2020/02/03 18:07:14.545] auto_throttle=1
|
[2020/02/03 18:07:14.545] backups=0
|
[2020/02/03 18:07:14.545] bitcnt=8
|
[2020/02/03 18:07:14.545] bloom=1
|
[2020/02/03 18:07:14.545] bloom_bit_count=26
|
[2020/02/03 18:07:14.545] bloom_hash_count=17
|
[2020/02/03 18:07:14.545] bloom_oldest=0
|
[2020/02/03 18:07:14.545] cache=252
|
[2020/02/03 18:07:14.545] cache_minimum=20
|
[2020/02/03 18:07:14.545] checkpoints=wiredtiger
|
[2020/02/03 18:07:14.545] checkpoint_log_size=159
|
[2020/02/03 18:07:14.545] checkpoint_wait=20
|
[2020/02/03 18:07:14.545] checksum=uncompressed
|
[2020/02/03 18:07:14.545] chunk_size=1
|
[2020/02/03 18:07:14.545] compaction=0
|
[2020/02/03 18:07:14.545] compression=none
|
[2020/02/03 18:07:14.545] data_extend=0
|
[2020/02/03 18:07:14.545] data_source=table
|
[2020/02/03 18:07:14.545] delete_pct=15
|
[2020/02/03 18:07:14.545] dictionary=1
|
[2020/02/03 18:07:14.545] direct_io=1
|
[2020/02/03 18:07:14.545] encryption=rotn-7
|
[2020/02/03 18:07:14.545] evict_max=5
|
[2020/02/03 18:07:14.545] file_type=row-store
|
[2020/02/03 18:07:14.545] firstfit=0
|
[2020/02/03 18:07:14.545] huffman_key=0
|
[2020/02/03 18:07:14.545] huffman_value=0
|
[2020/02/03 18:07:14.545] independent_thread_rng=1
|
[2020/02/03 18:07:14.545] in_memory=0
|
[2020/02/03 18:07:14.545] insert_pct=52
|
[2020/02/03 18:07:14.545] internal_key_truncation=1
|
[2020/02/03 18:07:14.545] internal_page_max=9
|
[2020/02/03 18:07:14.545] isolation=snapshot
|
[2020/02/03 18:07:14.545] key_gap=1
|
[2020/02/03 18:07:14.545] key_max=27
|
[2020/02/03 18:07:14.545] key_min=25
|
[2020/02/03 18:07:14.545] leaf_page_max=12
|
[2020/02/03 18:07:14.545] leak_memory=0
|
[2020/02/03 18:07:14.545] logging=0
|
[2020/02/03 18:07:14.545] logging_archive=1
|
[2020/02/03 18:07:14.545] logging_compression=none
|
[2020/02/03 18:07:14.545] logging_file_max=283583
|
[2020/02/03 18:07:14.545] logging_prealloc=0
|
[2020/02/03 18:07:14.545] lsm_worker_threads=3
|
[2020/02/03 18:07:14.545] major_timeout=0
|
[2020/02/03 18:07:14.545] memory_page_max=6
|
[2020/02/03 18:07:14.545] merge_max=4
|
[2020/02/03 18:07:14.545] mmap=1
|
[2020/02/03 18:07:14.545] modify_pct=2
|
[2020/02/03 18:07:14.545] ops=0
|
[2020/02/03 18:07:14.545] prefix_compression=0
|
[2020/02/03 18:07:14.545] prefix_compression_min=7
|
[2020/02/03 18:07:14.545] prepare=0
|
[2020/02/03 18:07:14.545] quiet=1
|
[2020/02/03 18:07:14.545] random_cursor=0
|
[2020/02/03 18:07:14.546] read_pct=26
|
[2020/02/03 18:07:14.546] rebalance=0
|
[2020/02/03 18:07:14.546] repeat_data_pct=34
|
[2020/02/03 18:07:14.546] reverse=1
|
[2020/02/03 18:07:14.546] rows=1000000
|
[2020/02/03 18:07:14.546] runs=1
|
[2020/02/03 18:07:14.546] salvage=0
|
[2020/02/03 18:07:14.546] split_pct=57
|
[2020/02/03 18:07:14.546] statistics=0
|
[2020/02/03 18:07:14.546] statistics_server=0
|
[2020/02/03 18:07:14.546] threads=21
|
[2020/02/03 18:07:14.546] timer=4
|
[2020/02/03 18:07:14.546] timing_stress_aggressive_sweep=0
|
[2020/02/03 18:07:14.546] timing_stress_checkpoint=0
|
[2020/02/03 18:07:14.546] timing_stress_lookaside_sweep=0
|
[2020/02/03 18:07:14.546] timing_stress_split_1=0
|
[2020/02/03 18:07:14.546] timing_stress_split_2=0
|
[2020/02/03 18:07:14.546] timing_stress_split_3=0
|
[2020/02/03 18:07:14.546] timing_stress_split_4=0
|
[2020/02/03 18:07:14.546] timing_stress_split_5=0
|
[2020/02/03 18:07:14.546] timing_stress_split_6=0
|
[2020/02/03 18:07:14.546] timing_stress_split_7=0
|
[2020/02/03 18:07:14.546] timing_stress_split_8=0
|
[2020/02/03 18:07:14.546] transaction_timestamps=1
|
[2020/02/03 18:07:14.546] transaction-frequency=100
|
[2020/02/03 18:07:14.546] truncate=1
|
[2020/02/03 18:07:14.546] value_max=2047
|
[2020/02/03 18:07:14.546] value_min=9
|
[2020/02/03 18:07:14.546] verify=1
|
[2020/02/03 18:07:14.546] wiredtiger_config=
|
[2020/02/03 18:07:14.546] write_pct=5
|
[2020/02/03 18:07:14.546] ############################################
|
Attachments
Issue Links
- is duplicated by
-
WT-5544 Fix cache stuck during salvage
-
- Closed
-