Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: WT2.9.0, 3.2.10, 3.3.11
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

I've been doing some experimentation with failure injection.

During one of my experiments I found that WiredTiger had hung following the random injection of a calloc failure.

Looking at the stack, I found the following:

Thread 3 (Thread 0x7f62df85a700 (LWP 4986)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000445fd2 in __wt_cond_wait_signal (session=0x7f62e18649d0, cond=0x15ec080, usecs=100000, signalled=0x7f62df859e9f) at ../src/os_posix/os_mtx_cond.c:82
#2  0x0000000000428d79 in __wt_cond_wait (session=0x7f62e18649d0, cond=0x15ec080, usecs=100000) at ../src/include/misc.i:18
#3  0x000000000042a24d in __evict_server (arg=0x7f62e18649d0) at ../src/evict/evict_lru.c:241
#4  0x00007f62e0bfc555 in start_thread (arg=0x7f62df85a700) at pthread_create.c:333
#5  0x00007f62e00f9b9d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7f62df059700 (LWP 4987)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000445fd2 in __wt_cond_wait_signal (session=0x7f62e1864d10, cond=0x161d590, usecs=10000000, signalled=0x7f62df058ebf) at ../src/os_posix/os_mtx_cond.c:82
#2  0x000000000041ec48 in __wt_cond_wait (session=0x7f62e1864d10, cond=0x161d590, usecs=10000000) at ../src/include/misc.i:18
#3  0x000000000041f8e1 in __sweep_server (arg=0x7f62e1864d10) at ../src/conn/conn_sweep.c:272
#4  0x00007f62e0bfc555 in start_thread (arg=0x7f62df059700) at pthread_create.c:333
#5  0x00007f62e00f9b9d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f62e1938740 (LWP 4985)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000445fd2 in __wt_cond_wait_signal (session=0x0, cond=0x161d600, usecs=100000, signalled=0x7ffd9a6979ff) at ../src/os_posix/os_mtx_cond.c:82
#2  0x000000000048e28a in __wt_cond_wait (session=0x0, cond=0x161d600, usecs=100000) at ../src/include/misc.i:18
#3  0x000000000048fa8e in __wt_async_flush (session=0x7f62e1864010) at ../src/async/async_api.c:533
#4  0x000000000041c6c4 in __wt_connection_close (conn=0x15da370) at ../src/conn/conn_open.c:104
#5  0x00000000004155fd in wiredtiger_open (home=0x52a67d "WT_TEST", event_handler=0x0,
    config=0x15d9b30 "create,cache_size=21G,checkpoint_sync=false,mmap=false,session_max=1024,lsm_manager=(worker_thread_max=6),create,cache_size=21G,checkpoint_sync=false,mmap=false,session_max=1024,lsm_manager=(worker_th"...,
    wt_connp=0x7ffd9a697ed0) at ../src/conn/conn_api.c:2092
#6  0x000000000040ac7c in start_run (cfg=0x7ffd9a697ea0) at ../../../bench/wtperf/wtperf.c:1947
#7  0x000000000040a851 in start_all_runs (cfg=0x7ffd9a697ea0) at ../../../bench/wtperf/wtperf.c:1858
#8  0x000000000040bf90 in main (argc=5, argv=0x7ffd9a6981b8) at ../../../bench/wtperf/wtperf.c:2322

As I understand it, the failure was introduced when creating the async worker threads. This caused the wiredtiger_open call to go into error handling during its setup and call _wt_connection_close which in turn calls _wt_async_flush, which will never complete as there is no async thread to process the flush.

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: David Hows (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jan 27 2016 05:16:41 AM UTC
Updated:: Jun 06 2019 10:29:15 PM UTC
Resolved:: Jul 12 2016 07:17:26 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates