Details
Description
After the changes in WT-3009, there have been a number of stuck cache aborts on test format runs that use LSM.
These are reproducible fairly quickly (under 50 runs) on Linux with configs such as below
############################################
|
# RUN PARAMETERS
|
############################################
|
abort=0
|
auto_throttle=1
|
backups=0
|
bitcnt=6
|
bloom=1
|
bloom_bit_count=45
|
bloom_hash_count=31
|
bloom_oldest=0
|
cache=30
|
checkpoints=1
|
checksum=uncompressed
|
chunk_size=1
|
compaction=0
|
compression=zlib
|
data_extend=0
|
data_source=lsm
|
delete_pct=14
|
dictionary=0
|
direct_io=0
|
encryption=none
|
evict_max=4
|
file_type=row-store
|
firstfit=0
|
huffman_key=0
|
huffman_value=0
|
in_memory=0
|
insert_pct=73
|
internal_key_truncation=0
|
internal_page_max=10
|
isolation=random
|
key_gap=12
|
key_max=64
|
key_min=26
|
leaf_page_max=17
|
leak_memory=0
|
logging=1
|
logging_archive=0
|
logging_compression=none
|
logging_prealloc=0
|
long_running_txn=0
|
lsm_worker_threads=4
|
merge_max=17
|
mmap=1
|
ops=100000
|
prefix_compression=1
|
prefix_compression_min=6
|
quiet=1
|
repeat_data_pct=29
|
reverse=0
|
rows=100000
|
runs=1
|
rebalance=1
|
salvage=1
|
split_pct=85
|
statistics=1
|
statistics_server=0
|
threads=11
|
timer=20
|
transaction-frequency=36
|
value_max=1638
|
value_min=15
|
verify=1
|
wiredtiger_config=
|
write_pct=42
|
############################################
|
One solution is to modify the changes to the evict trigger setting changed in WT-3009. The more correct option is likely to change how dirty page accounting works in LSM. Currently dirty pages on the primary LSM chunk are counted towards the dirty page total. As these dirty pages are fully expected, capped in size and dealt with by LSM merges they can potentially be removed from the count.