Description
This issue was noticed while working on WT-2731, with the following test/format configuration:
############################################
|
# RUN PARAMETERS
|
############################################
|
abort=0
|
auto_throttle=1
|
backups=0
|
bitcnt=4
|
bloom=1
|
bloom_bit_count=4
|
bloom_hash_count=30
|
bloom_oldest=0
|
cache=5
|
checkpoints=1
|
checksum=uncompressed
|
chunk_size=4
|
compaction=0
|
compression=zlib
|
data_extend=0
|
data_source=file
|
delete_pct=35
|
dictionary=0
|
direct_io=0
|
encryption=rotn-7
|
evict_max=0
|
file_type=row-store
|
firstfit=1
|
huffman_key=0
|
huffman_value=0
|
in_memory=0
|
insert_pct=24
|
internal_key_truncation=1
|
internal_page_max=11
|
isolation=snapshot
|
key_gap=9
|
key_max=51
|
key_min=18
|
leaf_page_max=17
|
leak_memory=0
|
logging=0
|
logging_archive=0
|
logging_compression=none
|
logging_prealloc=0
|
long_running_txn=0
|
lsm_worker_threads=3
|
merge_max=16
|
mmap=1
|
ops=100000
|
prefix_compression=1
|
prefix_compression_min=6
|
quiet=1
|
repeat_data_pct=66
|
reverse=0
|
rows=100000
|
runs=1
|
rebalance=1
|
salvage=1
|
split_pct=69
|
statistics=1
|
statistics_server=0
|
threads=4
|
timer=20
|
transaction-frequency=87
|
value_max=3294
|
value_min=8
|
verify=1
|
wiredtiger_config=
|
write_pct=71
|
############################################
|
The cache dump looks like:
==========
|
cache dump
|
file:wt(<live>):
|
internal pages: 1 pages, 1545 max, 0MB total
|
leaf pages: 4 pages, 1392140 max, 4MB total
|
dirty pages: 1 pages, 1545 max, 0MB total
|
file:WiredTigerLAS.wt(<live>):
|
internal pages: 1 pages, 249 max, 0MB total
|
leaf pages: 1 pages, 412 max, 0MB total
|
dirty pages: 1 pages, 249 max, 0MB total
|
file:WiredTiger.wt(<live>):
|
internal pages: 1 pages, 249 max, 0MB total
|
dirty pages: 1 pages, 249 max, 0MB total
|
cache dump: total found = 5MB vs tracked inuse 5MB
|
==========
|
There are 4 clean leaf pages, and 4 threads running snapshot isolation transactions, each pinning a single page. In this case, I'd expect the cache stuck check to fire, but it's not. After some time in a debugger, it appears as though there is some eviction activity happening via the lookaside file:
(gdb) where
|
#0 __wt_cache_page_evict (session=0x632000001500, page=0x6080000d2020)
|
at ../src/include/btree.i:302
|
#1 0x0000000000ac1d1e in __wt_page_out (session=0x632000001500, pagep=0x60400000d790)
|
at ../src/btree/bt_discard.c:104
|
#2 0x0000000000ac0d93 in __wt_ref_out (session=0x632000001500, ref=0x60400000d790)
|
at ../src/btree/bt_discard.c:33
|
#3 0x0000000000651ae5 in __evict_page_clean_update (session=0x632000001500,
|
ref=0x60400000d790, closing=false) at ../src/evict/evict_page.c:224
|
#4 0x000000000064cfe0 in __wt_evict (session=<optimized out>, ref=<optimized out>,
|
closing=<optimized out>) at ../src/evict/evict_page.c:121
|
#5 0x0000000000628dd6 in __evict_page (session=0x632000001500, is_server=true)
|
at ../src/evict/evict_lru.c:1665
|
#6 0x0000000000639baa in __evict_lru_pages (session=0x632000001500, is_server=true)
|
at ../src/evict/evict_lru.c:916
|
#7 0x000000000063b93b in __evict_pass (session=0x632000001500) at ../src/evict/evict_lru.c:677
|
#8 0x00000000006368ab in __evict_server (session=0x632000001500, did_work=0x7f68c5ffee30)
|
at ../src/evict/evict_lru.c:271
|
#9 0x000000000061c5f9 in __evict_thread_run (arg=0x632000001500)
|
at ../src/evict/evict_lru.c:207
|
#10 0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0
|
#11 0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6
|
(gdb) p page->memory_footprint
|
$43 = 412
|
(gdb) p page->dsk
|
$44 = (const WT_PAGE_HEADER *) 0x6120002824c0
|
(gdb) p *$44
|
$45 = {recno = 0, write_gen = 123, mem_size = 316, u = {entries = 8, datalen = 8},
|
type = 7 '\a', flags = 12 '\f', unused = "\000"}
|
(gdb) p session->dhandle->name
|
$46 = 0x60300000d750 "file:WiredTigerLAS.wt"
|
It is specifically the __wt_las_sweep function that is triggering cache activity:
(gdb) where
|
#0 __wt_las_sweep (session=0x632000001840) at ../src/cache/cache_las.c:289
|
#1 0x00000000005bf76d in __sweep_server (arg=0x632000001840) at ../src/conn/conn_sweep.c:283
|
#2 0x00007f68cace5df3 in start_thread () from /lib64/libpthread.so.0
|
#3 0x00007f68c9ecf1ad in clone () from /lib64/libc.so.6
|
We should stop counting eviction of lookaside file pages as relevant to the cache->evict_page count, so that the diagnostic stuck cache check will fire as expected.