-
Type:
Task
-
Status: Closed
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: WT2.0
-
Component/s: None
-
Labels:None
I came across a hang running test/format with LSM. It doesn't seem to be directly related to LSM.
I see the following stack traces when the application is hung:
(gdb) thread apply all where
|
|
Thread 2 (Thread 0x7ffff77d6700 (LWP 45983)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
|
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:217
|
WT-1 0x00000000004238c3 in __wt_cond_wait (session=0x8c46c0, cond=0x8cb320,
|
usecs=100000) at ../src/os_posix/os_mtx.c:75
|
WT-2 0x000000000043e055 in __wt_cache_evict_server (arg=0x8c46c0)
|
at ../src/btree/bt_evict.c:167
|
WT-3 0x000000383c007d15 in start_thread (arg=0x7ffff77d6700)
|
at pthread_create.c:308
|
WT-4 0x000000383b8f248d in clone ()
|
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:114
|
|
Thread 1 (Thread 0x7ffff7de1740 (LWP 45974)):
|
#0 pthread_rwlock_wrlock ()
|
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
|
WT-1 0x0000000000423eb0 in __wt_writelock (session=0x8c4cf0,
|
rwlock=0x7ffff0410060) at ../src/os_posix/os_mtx.c:239
|
WT-2 0x000000000046487c in __wt_conn_btree_close (session=0x8c4cf0, locked=0)
|
at ../src/conn/conn_dhandle.c:480
|
WT-3 0x0000000000433c4d in __wt_session_discard_btree (session=0x8c4cf0,
|
dhandle_cache=0x0) at ../src/session/session_dhandle.c:323
|
WT-4 0x00000000004300a9 in __session_close_cache (session=0x8c4cf0)
|
at ../src/session/session_api.c:39
|
WT-5 0x0000000000430375 in __session_close (wt_session=0x8c4cf0, config=0x0)
|
at ../src/session/session_api.c:82
|
WT-6 0x000000000041d3dd in __lsm_tree_close (session=0x8c48d0, lsm_tree=0x8e2440)
|
at ../src/lsm/lsm_tree.c:139
|
WT-7 0x000000000041d48a in __wt_lsm_tree_close_all (session=0x8c48d0)
|
at ../src/lsm/lsm_tree.c:166
|
WT-8 0x000000000041b53f in __wt_connection_close (conn=0x8c2420)
|
at ../src/conn/conn_open.c:94
|
WT-9 0x0000000000417580 in __conn_close (wt_conn=0x8c2420, config=0x0)
|
at ../src/conn/conn_api.c:386
|
WT-10 0x0000000000415789 in wts_close () at ../../../test/format/wts.c:269
|
WT-11 0x0000000000413a25 in main (argc=0, argv=0x7fffffffe370)
|
at ../../../test/format/t.c:158
|
Thread 1 is the interesting one, it's attempting to get a write lock on a dhandle. The dhandle has a reference count of 0, yet we can't get the write lock.
The dhandle does not refer to the metadata file, and the session isn't the default session.
I suspect that we're either failing to cleanup on error from a _wt_session_lock_btree call (though I'd expect to see a different error earlier in that case), or we are possibly racing opening a handle, and leaving something in a bad state.
The config I used to produce this is:
############################################
|
# RUN PARAMETERS
|
############################################
|
# bitcnt not applicable to this run
|
cache=94
|
compression=bzip
|
data_extend=0
|
data_source=lsm
|
delete_pct=14
|
dictionary=0
|
file_type=row-store
|
huffman_key=0
|
huffman_value=0
|
insert_pct=40
|
internal_key_truncation=0
|
internal_page_max=14
|
key_gap=4
|
key_max=102
|
key_min=27
|
leaf_page_max=21
|
ops=382650
|
prefix=1
|
repeat_data_pct=37
|
reverse=0
|
rows=600
|
runs=0
|
split_pct=65
|
threads=10
|
value_max=2186
|
value_min=3
|
#wiredtiger_config=lsm_merge=false
|
write_pct=5
|
############################################
|
The most interesting thing is that it's configuring LSM.
- related to
-
WT-1 placeholder WT-1
- Closed
-
WT-2 What does metadata look like?
- Closed
-
WT-3 What file formats are required?
- Closed
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed