Details
-
Task
-
Status: Closed
-
Major - P3
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
Description
I noticed that wiredtiger-test-format had been running on tinderbox for 4 days. It is hung. The config:
############################################
|
# RUN PARAMETERS
|
############################################
|
abort=0
|
auto_throttle=1
|
firstfit=0
|
bitcnt=5
|
bloom=1
|
bloom_bit_count=17
|
bloom_hash_count=16
|
bloom_oldest=0
|
cache=1
|
checkpoints=1
|
checksum=uncompressed
|
chunk_size=8
|
compaction=0
|
compression=none
|
data_extend=0
|
data_source=file
|
delete_pct=31
|
dictionary=0
|
evict_max=0
|
file_type=row-store
|
backups=1
|
huffman_key=0
|
huffman_value=0
|
insert_pct=7
|
internal_key_truncation=1
|
internal_page_max=17
|
isolation=read-committed
|
key_gap=4
|
key_max=122
|
key_min=17
|
leak_memory=0
|
leaf_page_max=10
|
logging=0
|
logging_archive=1
|
logging_prealloc=0
|
logging=0
|
lsm_worker_threads=3
|
merge_max=6
|
mmap=1
|
ops=1000
|
prefix_compression=1
|
prefix_compression_min=0
|
repeat_data_pct=64
|
reverse=0
|
rows=10000
|
runs=50
|
split_pct=80
|
statistics=1
|
threads=1
|
timer=0
|
value_max=1107
|
value_min=18
|
wiredtiger_config=
|
write_pct=50
|
############################################
|
Attaching to the process here are the stacks:
(gdb) thread apply all bt
|
Thread 3 (Thread 0x7f013687d700 (LWP 127010)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
|
WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7b40, cond=0x1ce2a10, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77
|
WT-2 0x000000000042402e in __evict_server (arg=0x1cc7b40) at ../src/evict/evict_lru.c:195
|
WT-3 0x0000003f17e07ee5 in start_thread (arg=0x7f013687d700) at pthread_create.c:309
|
WT-4 0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
|
|
Thread 2 (Thread 0x7f013607c700 (LWP 127011)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
|
WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc7e00, cond=0x1cf9250, usecs=usecs@entry=10000000) at ../src/os_posix/os_mtx_cond.c:77
|
WT-2 0x000000000041c744 in __sweep_server (arg=0x1cc7e00) at ../src/conn/conn_sweep.c:122
|
WT-3 0x0000003f17e07ee5 in start_thread (arg=0x7f013607c700) at pthread_create.c:309
|
WT-4 0x0000003f176f4b8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
|
|
Thread 1 (Thread 0x7f0136a8a740 (LWP 94803)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
|
WT-1 0x00000000004354af in __wt_cond_wait (session=session@entry=0x1cc80c0, cond=0x1ce2a80, usecs=usecs@entry=100000) at ../src/os_posix/os_mtx_cond.c:77
|
WT-2 0x00000000004c523c in __wt_cache_full_check (session=0x1cc80c0) at ../src/include/cache.i:166
|
WT-3 __cursor_enter (session=0x1cc80c0) at ../src/include/cursor.i:62
|
WT-4 __curfile_enter (cbt=<optimized out>, cbt=<optimized out>) at ../src/include/cursor.i:97
|
WT-5 __cursor_func_init (reenter=1, cbt=0x24a2c50) at ../src/include/cursor.i:184
|
WT-6 __wt_btcur_search (cbt=cbt@entry=0x24a2c50) at ../src/btree/bt_cursor.c:307
|
WT-7 0x0000000000493533 in __curfile_search (cursor=0x24a2c50) at ../src/cursor/cur_file.c:166
|
WT-8 0x00000000004127a4 in read_row (keyno=2, key=0x7fff0744e670, cursor=0x24a2c50) at ../../../test/format/ops.c:601
|
WT-9 wts_read_scan () at ../../../test/format/ops.c:560
|
WT-10 0x000000000040d8d7 in main (argc=<optimized out>, argv=<optimized out>) at ../../../test/format/t.c:187
|
This is a tiny cache of 1Mb.
(gdb) p pages_evicted
|
$2 = 2166
|
(gdb) p *cache
|
$3 = {bytes_inmem = 6458665, pages_inmem = 2169, bytes_evict = 5202306, pages_evict = 2166, bytes_dirty = 1891, pages_dirty = 1, read_gen = 4450648,
|
evict_cond = 0x1ce2a10, evict_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
|
__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6a1 "cache eviction", id = 0 '\000',
|
initialized = 1 '\001'}, evict_walk_lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
|
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, counter = 0, name = 0x65e6b0 "cache walk", id = 0 '\000',
|
initialized = 1 '\001'}, evict_waiter_cond = 0x1ce2a80, eviction_trigger = 95, eviction_target = 80, eviction_dirty_target = 80, evict = 0x1ce2af0,
|
evict_current = 0x0, evict_candidates = 0, evict_entries = 0, evict_max = 0, evict_slots = 400, evict_file_next = 0x0, sync_request = 0, sync_complete = 0,
|
cp_saved_evict = 0, cp_current_evict = 0, cp_skip_count = 0, cp_reserved = 0, cp_session = 0x0, cp_tid = 0, flags = 16}
|
(gdb) p conn->cache_size
|
$5 = 1048576
|
We have one, very large (larger than cache), internal page that says it is already on disk:
(gdb) p *ref->page
|
$32 = {u = {intl = {recno = 0, parent_ref = 0x1cfa0c0, __index = 0x24e6920}, row = {d = 0x0, ins = 0x1cfa0c0, upd = 0x24e6920, entries = 0}, col_fix = {
|
recno = 0, bitf = 0x1cfa0c0 "\360:\373\001", entries = 38693152}, col_var = {recno = 0, d = 0x1cfa0c0, repeats = 0x24e6920, nrepeats = 0, entries = 0}},
|
dsk = 0x1fb3c00, modify = 0x0, read_gen = 0, memory_footprint = 1254123, type = 6 '\006', flags_atomic = 2 '\002'}
|
(gdb) p *ref
|
$35 = {page = 0x0, home = 0x1fb3af0, ref_hint = 0, state = WT_REF_DISK, addr = 0x1fb3c2a, key = {recno = 4294967379, ikey = 0x100000053}, page_del = 0x0}
|
Attachments
Issue Links
- is related to
-
WT-1613 test/format stress hang - eviction stuck
- Closed
-
WT-1646 test/format stress hang
- Closed
-
WT-1759 Limit size of internal pages
- Closed
-
WT-1788 Review eviction server algorithm
-
- Closed
-
-
WT-1851 test/format hung in LSM tree.
-
- Closed
-
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed