Details
-
Task
-
Resolution: Done
-
None
-
None
-
None
-
None
Description
Running test/format with the following configuration:
############################################
|
# RUN PARAMETERS
|
############################################
|
# bitcnt not applicable to this run
|
cache=94
|
compression=bzip
|
data_extend=0
|
data_source=lsm
|
delete_pct=14
|
dictionary=0
|
file_type=row-store
|
hot_backups=0
|
huffman_key=0
|
huffman_value=0
|
insert_pct=40
|
internal_key_truncation=0
|
internal_page_max=14
|
key_gap=4
|
key_max=102
|
key_min=27
|
leaf_page_max=21
|
ops=382656
|
prefix=1
|
repeat_data_pct=37
|
reverse=0
|
rows=600067
|
runs=0
|
split_pct=65
|
threads=10
|
value_max=2186
|
value_min=3
|
# wiredtiger_config not applicable to this run
|
write_pct=5
|
############################################
|
The application ends up stuck (it's not making any progress at all. All application threads have the call stack:
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
|
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
|
WT-1 0x0000000000423c5f in __wt_cond_wait (session=0x8c4b40, cond=0x8cb8a0,
|
usecs=10000) at ../src/os_posix/os_mtx.c:75
|
WT-2 0x000000000044488a in __wt_cache_full_check (session=0x8c4b40, onepass=0)
|
at ../src/include/cache.i:87
|
WT-3 0x000000000044498b in __wt_page_in_func (session=0x8c4b40,
|
parent=0x7fffe8b9b550, ref=0x7fffe8b9bad0,
|
file=0x66beb6 "../src/btree/row_srch.c", line=201)
|
at ../src/btree/bt_page.c:47
|
WT-4 0x00000000004a3c3a in __wt_page_swap_func (session=0x8c4b40,
|
out=0x7fffe8b9b550, in=0x7fffe8b9b550, inref=0x7fffe8b9bad0,
|
file=0x66beb6 "../src/btree/row_srch.c", line=201)
|
at ../src/include/btree.i:489
|
The eviction server is looping as expected, populating the eviction queue. However the WT_EVICT_NO_PROGRESS flag is never being cleared, so no pages are being successfully evicted.
The WT_EVICT_STUCK flag is set, but the clause at bt_evict.c:__evict_get_page:961 that aborts transactions is never firing. I wonder if the __wt_txn_oldest check isn't working as expected?
We should figure out how to make progress. I suspect that all pages have open hazard references. I'll need to look more carefully at the state of the cache.