-
Type: Task
-
Resolution: Done
-
Affects Version/s: None
-
Component/s: None
-
None
When running the 'small' configuration on the leveldb benchmark, I am seeing a deadlock hang. Reproduce by:
DYLD_LIBRARY_PATH=../wiredtiger/build_posix/.libs:../wiredtiger/build_posix/ext/compressors/snappy/.libs/ ./db_bench_wiredtiger --cache_size=6537216 --threads=1 --benchmarks=fill100K
There is only 1 application thread. The threads are hung on the lsm_tree->rwlock. Here is the stack of relevant threads.
(gdb) thread apply all bt Thread 7 (process 48237): #0 0x00007fff9477a0fa in __psynch_cvwait () WT-1 0x00007fff96b3cfe9 in _pthread_cond_wait () WT-2 0x000000010ca68ed9 in __wt_cond_wait (session=0x7fdc43005880, cond=0x7fdc42800840, usecs=10000) at os_mtx.c:75 WT-3 0x000000010c9e0581 in __wt_cache_full_check (session=0x7fdc43005880, onepass=1) at cache.i:87 WT-4 0x000000010c9e00e0 in __cursor_leave (cbt=0x7fdc414048c0) at cursor.i:86 WT-5 0x000000010c9e353a in __wt_btcur_close (cbt=0x7fdc414048c0) at bt_cursor.c:742 WT-6 0x000000010ca40b6e in __curfile_close (cursor=0x7fdc414048c0) at cur_file.c:298 WT-7 0x000000010ca54323 in __clsm_close_cursors (clsm=0x7fdc41406100, update=1, skip_chunks=0) at lsm_cursor.c:100 WT-8 0x000000010ca546e9 in __clsm_open_cursors (clsm=0x7fdc41406100, update=1, start_chunk=0, start_id=0) at lsm_cursor.c:182 WT-9 0x000000010ca55d2c in __clsm_enter (clsm=0x7fdc41406100, update=1) at lsm_cursor.c:48 WT-10 0x000000010ca576d9 in __clsm_insert (cursor=0x7fdc41406100) at lsm_cursor.c:945 WT-11 0x000000010c950ad7 in DoWrite (this=0x7fff532b4870, thread=0x7fdc43014400, seq=false) at db_bench_wiredtiger.cc:917 WT-12 0x000000010c951002 in WriteRandom (this=0x7fff532b4870, thread=0x7fdc43014400) at db_bench_wiredtiger.cc:868 WT-13 0x000000010c955769 in leveldb::Benchmark::ThreadBody (v=0x7fdc42800200) at db_bench_wiredtiger.cc:661 WT-14 0x000000010c978772 in leveldb::(anonymous namespace)::StartThreadWrapper () at stl_vector.h:271 WT-15 0x00007fff96b387a2 in _pthread_start () WT-16 0x00007fff96b251e1 in thread_start () Thread 6 (process 48237): #0 0x00007fff9477a1ae in __psynch_rw_wrlock () WT-1 0x00007fff96b3eea6 in pthread_rwlock_wrlock () WT-2 0x000000010ca69741 in __wt_writelock (session=0x7fdc43005aa0, rwlock=0x7fdc42803360) at os_mtx.c:239 WT-3 0x000000010ca611b6 in __wt_lsm_checkpoint_worker (arg=0x7fdc4302d600) at lsm_worker.c:291 WT-4 0x00007fff96b387a2 in _pthread_start () WT-5 0x00007fff96b251e1 in thread_start () Thread 5 (process 48237): #0 0x00007fff9477a1ae in __psynch_rw_wrlock () WT-1 0x00007fff96b3eea6 in pthread_rwlock_wrlock () WT-2 0x000000010ca69741 in __wt_writelock (session=0x7fdc43005cc0, rwlock=0x7fdc42803360) at os_mtx.c:239 WT-3 0x000000010ca59b9c in __wt_lsm_merge (session=0x7fdc43005cc0, lsm_tree=0x7fdc4302d600, id=0, stalls=1) at lsm_merge.c:99 WT-4 0x000000010ca608db in __wt_lsm_merge_worker (vargs=0x7fdc428038b0) at lsm_worker.c:87 WT-5 0x00007fff96b387a2 in _pthread_start () WT-6 0x00007fff96b251e1 in thread_start () Thread 2 (process 48237): #0 0x00007fff9477a0fa in __psynch_cvwait () WT-1 0x00007fff96b3cfe9 in _pthread_cond_wait () WT-2 0x000000010ca68ed9 in __wt_cond_wait (session=0x7fdc43005220, cond=0x7fdc428007c0, usecs=100000) at os_mtx.c:75 WT-3 0x000000010c9e90c0 in __wt_cache_evict_server (arg=0x7fdc43005220) at bt_evict.c:167 WT-4 0x00007fff96b387a2 in _pthread_start () WT-5 0x00007fff96b251e1 in thread_start () (gdb)
Thread 7 takes the lsm_tree->rwlock for reading in __clsm_close_cursors. Both WT_EVICT_STUCK and WT_EVICT_NO_PROGRESS are set. The application thread is stuck in *wt_cache_full_check even though one_pass is 1, because the call to *wt_evict_lru_page continually returns WT_NOTFOUND.
So, thread 7 is stuck holding the read lock in the forever loop in *wt_cache_full_check because *wt_evict_lru_page never returns anything other than WT_NOTFOUND. The lsm merge thread cannot make progress because it cannot get the writelock. The evict thread never finds anything it can evict.
This is different than the no-progress report Alex mentions in WT-573 .
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-13 Define supported platforms
- Closed
-
WT-14 Windows build
- Closed
-
WT-15 Automated build/test infrastructure
- Closed
-
WT-16 Test suite
- Closed
-
WT-573 Unable to proceed due to cache full with LSM test/format
- Closed