-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
Hi guys,
I got some more info on why we hang when running the "small" config of LevelDB Bench with four or more threads.
To reproduce:
env LD_LIBRARY_PATH=../wt-dev-branch/build_posix/.libs:../wt-dev-branch/build_posix /ext/compressors/snappy/.libs/ TEST_TMPDIR= ./db_bench_wiredtiger --cache_size=6537216 --threads=4 --db=/tmpfs/leveldb --benchmarks=fillrandom,overwrite,readrandom
The benchmark goes fine through the first two phases (fillrandom and overwrite), but then goes into an infinite loop in readrandom.
Examining the stack trace gives me the following:
WT-1 0x00007ffff795fdf9 in __wt_cache_full_check (session=0x6499b0) at ../src/include/cache.i:69
WT-2 __wt_page_in_func (session=0x6499b0, parent=0x7fffe4308a10, ref=0x7fffe4308ad0) at ../src/btree/bt_page.c:42
WT-3 0x00007ffff7984e40 in __wt_row_search (session=0x6499b0, cbt=0x7fffe41d5ed0, is_modify=0) at ../src/btree/row_srch.c:174
WT-4 0x00007ffff795301d in __wt_btcur_search (cbt=0x7fffe41d5ed0) at ../src/btree/bt_cursor.c:146
WT-5 0x00007ffff798e052 in __curfile_search (cursor=0x7fffe41d5ed0) at ../src/cursor/cur_file.c:133
WT-6 0x00007ffff7995b81 in __clsm_search (cursor=0x7fffe41caa50) at ../src/lsm/lsm_cursor.c:581
WT-7 0x0000000000404077 in leveldb::Benchmark::ReadRandom(leveldb::(anonymous namespace)::ThreadState*) ()
WT-8 0x000000000040897e in leveldb::Benchmark::ThreadBody(void*) ()
WT-9 0x0000000000432a3a in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
WT-10 0x00007ffff6f07d86 in start_thread () from /lib64/libpthread.so.0
WT-11 0x00007ffff6c4066d in clone () from /lib64/libc.so.6
I checked every thread, and every one of them is stuck in the same place in an infinite loop. If we look at the code where we loop, we see the following:
for (wake = 0;; wake = (wake + 1) % 100)
{ WT_RET(__wt_eviction_check(session, &lockout, wake == 0)); if (!lockout || F_ISSET(session, WT_SESSION_NO_CACHE_CHECK | WT_SESSION_SCHEMA_LOCKED)) return (0); if (F_ISSET(btree, WT_BTREE_BULK | WT_BTREE_NO_CACHE | WT_BTREE_NO_EVICTION)) return (0); if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY) __wt_yield(); else WT_RET_NOTFOUND_OK(ret); }What happens is that we drop all the way down to the last "else" statement, but don't return, because the ret value is actually WT_RET_NOTFOUND, so we stay in the loop forever. It appears that we can't find a page to evict.
- is related to
-
WT-441 Allow LSM trees to discard the btree handle from the active chunk.
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-443 Segfault in __wt_row_search
- Closed