-
Type:
Task
-
Status: Closed
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: None
-
Labels:
There is a test/format LSM job that got stuck. The configuration file is:
############################################
|
# RUN PARAMETERS
|
############################################
|
abort=0
|
auto_throttle=1
|
firstfit=0
|
bitcnt=2
|
bloom=1
|
bloom_bit_count=4
|
bloom_hash_count=24
|
bloom_oldest=0
|
cache=30
|
checkpoints=1
|
checksum=uncompressed
|
chunk_size=1
|
compaction=0
|
compression=zlib
|
data_extend=0
|
data_source=lsm
|
delete_pct=18
|
dictionary=0
|
evict_max=5
|
file_type=row-store
|
backups=0
|
huffman_key=0
|
huffman_value=0
|
insert_pct=83
|
internal_key_truncation=1
|
internal_page_max=14
|
isolation=read-uncommitted
|
key_gap=1
|
key_max=122
|
key_min=10
|
leak_memory=0
|
leaf_page_max=11
|
logging=0
|
logging_archive=1
|
logging_prealloc=1
|
logging=0
|
lsm_worker_threads=4
|
merge_max=13
|
mmap=1
|
ops=100000
|
prefix_compression=1
|
prefix_compression_min=6
|
repeat_data_pct=54
|
reverse=0
|
rows=100000
|
runs=100
|
split_pct=67
|
statistics=0
|
statistics_server=0
|
threads=21
|
timer=0
|
value_max=3202
|
value_min=15
|
wiredtiger_config=
|
write_pct=66
|
############################################
|
The LSM tree has 20 active chunks. Of those chunks 5 are flushed, the rest are all in memory. The non-flushed chunks are filling the cache.
There are 4 LSM worker threads, one of which is the manager. One thread can only do switch and drop operations (that thread is idle), one of which is currently doing a merge, but stuck with cache full:
Thread 28 (Thread 0x7f6dfc378700 (LWP 110983)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
|
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
|
WT-1 0x00000000004402f6 in __wt_cond_wait (session=0x2525fe0, cond=0x25208c0,
|
usecs=100000) at ../src/os_posix/os_mtx_cond.c:78
|
WT-2 0x000000000042c031 in __wt_cache_wait (session=0x2525fe0, full=131)
|
at ../src/evict/evict_lru.c:1464
|
WT-3 0x00000000004dd1cf in __wt_cache_full_check (session=0x2525fe0)
|
at ../src/include/cache.i:197
|
WT-4 0x00000000004de510 in __cursor_enter (session=0x2525fe0)
|
at ../src/include/cursor.i:63
|
WT-5 0x00000000004de5d8 in __curfile_enter (cbt=0x7f6de4061b20)
|
at ../src/include/cursor.i:96
|
WT-6 0x00000000004de791 in __cursor_func_init (cbt=0x7f6de4061b20, reenter=0)
|
at ../src/include/cursor.i:198
|
WT-7 0x00000000004e00a9 in __wt_btcur_next (cbt=0x7f6de4061b20, truncating=0)
|
at ../src/btree/bt_curnext.c:415
|
WT-8 0x00000000004b144b in __curfile_next (cursor=0x7f6de4061b20)
|
at ../src/cursor/cur_file.c:113
|
WT-9 0x00000000004c7373 in __clsm_next (cursor=0x7f6de4183e10)
|
---Type <return> to continue, or q <return> to quit---
|
at ../src/lsm/lsm_cursor.c:795
|
WT-10 0x00000000004cb4cc in __wt_lsm_merge (session=0x2525fe0,
|
lsm_tree=0x25034e0, id=2) at ../src/lsm/lsm_merge.c:346
|
WT-11 0x000000000043b497 in __lsm_worker (arg=0x2517920)
|
at ../src/lsm/lsm_worker.c:138
|
One of which is creating a bloom filter, and is stuck waiting for the cache to get less full:
Thread 27 (Thread 0x7f6dfbb77700 (LWP 110984)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
|
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
|
WT-1 0x00000000004402f6 in __wt_cond_wait (session=0x25262e0, cond=0x25208c0,
|
usecs=100000) at ../src/os_posix/os_mtx_cond.c:78
|
WT-2 0x000000000042c031 in __wt_cache_wait (session=0x25262e0, full=131)
|
at ../src/evict/evict_lru.c:1464
|
WT-3 0x00000000004dd1cf in __wt_cache_full_check (session=0x25262e0)
|
at ../src/include/cache.i:197
|
WT-4 0x00000000004de510 in __cursor_enter (session=0x25262e0)
|
at ../src/include/cursor.i:63
|
WT-5 0x00000000004de5d8 in __curfile_enter (cbt=0x7f6da41c7800)
|
at ../src/include/cursor.i:96
|
WT-6 0x00000000004de791 in __cursor_func_init (cbt=0x7f6da41c7800, reenter=0)
|
at ../src/include/cursor.i:198
|
WT-7 0x00000000004e00a9 in __wt_btcur_next (cbt=0x7f6da41c7800, truncating=0)
|
at ../src/btree/bt_curnext.c:415
|
WT-8 0x00000000004b144b in __curfile_next (cursor=0x7f6da41c7800)
|
at ../src/cursor/cur_file.c:113
|
WT-9 0x00000000004c7373 in __clsm_next (cursor=0x7f6da4c21a20)
|
at ../src/lsm/lsm_cursor.c:795
|
WT-10 0x00000000004ce9ad in __lsm_bloom_create (session=0x25262e0,
|
lsm_tree=0x25034e0, chunk=0x7f6d48003d30, chunk_off=7)
|
at ../src/lsm/lsm_work_unit.c:405
|
WT-11 0x00000000004ce124 in __wt_lsm_work_bloom (session=0x25262e0,
|
lsm_tree=0x25034e0) at ../src/lsm/lsm_work_unit.c:224
|
WT-12 0x000000000043b2cf in __lsm_worker_general_op (session=0x25262e0,
|
cookie=0x2517948, completed=0x7f6dfbb76ee0) at ../src/lsm/lsm_worker.c:74
|
WT-13 0x000000000043b3bf in __lsm_worker (arg=0x2517948)
|
at ../src/lsm/lsm_worker.c:122
|
I think creating bloom filters doesn't expect to get stuck waiting for space in the cache. In
we set the
|
flag when doing a post create traversal of the bloom filter. We don't set that flag when traversing the chunk to create the bloom filter itself, even though we set
|
Alternatively we could fiddle with the LSM worker thread work unit assignments, so that the thread that only does switches and drops (very short lived operations) could do flushes as well if we've stopped making progress. The difficulty would be in determining when we are and aren't making progress.
- is related to
-
WT-1722 Don't allow LSM bloom create to block waiting for space in the cache.
- Closed
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-13 Define supported platforms
- Closed