Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3097

Race on reconfigure or shutdown can lead to waiting for statistics log server

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.2, 3.2.13, 3.4.3, 3.5.4
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Storage 2017-01-23, Storage 2017-02-13

      The Jenkins test machine detected a hang when running the reconfigure test. The hang was when shutting down the statistics log server, the reconfigure thread ends up waiting for the statistics logging thread to wake. The current wait time is 76,000 seconds, so it will wait essentially forever.

      The relevant call stacks are:

      Thread 14 (Thread 0x7fa38eff5700 (LWP 50565)):
      #0  0x0000003e6ce0c8e9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00000000004394b1 in __wt_cond_wait_signal (session=session@entry=0x1577720, cond=0x16780e0,
          usecs=<optimized out>, signalled=signalled@entry=0x7fa38eff4ecf) at ../src/os_posix/os_mtx_cond.c:71
      #2  0x00000000004127b1 in __wt_cond_wait (usecs=<optimized out>, cond=<optimized out>, session=0x1577720)
          at ../src/include/misc.i:18
      #3  __statlog_server (arg=0x1577720) at ../src/conn/conn_stat.c:531
      #4  0x0000003e6ce07555 in start_thread () from /lib64/libpthread.so.0
      #5  0x0000003e6cb02ded in clone () from /lib64/libc.so.6
      Thread 1 (Thread 0x7fa3a8b59700 (LWP 48376)):
      #0  0x0000003e6ca349c8 in raise () from /lib64/libc.so.6
      #1  0x0000003e6ca3665a in abort () from /lib64/libc.so.6
      #2  0x0000000000405883 in on_alarm (signo=<optimized out>) at ../../../test/csuite/wt2719_reconfig/main.c:203
      #3  <signal handler called>
      #4  0x0000003e6ce0859b in pthread_join () from /lib64/libpthread.so.0
      #5  0x00000000004397c3 in __wt_thread_join (session=session@entry=0x1571ec0, tid=<optimized out>)
          at ../src/os_posix/os_thread.c:37
      #6  0x0000000000412a8b in __wt_statlog_destroy (is_close=false, session=0x1571ec0) at ../src/conn/conn_stat.c:639
      #7  __wt_statlog_create (session=session@entry=0x1571ec0, cfg=cfg@entry=0x7ffc9e527180) at ../src/conn/conn_stat.c:613
      #8  0x0000000000409461 in __conn_reconfigure (wt_conn=0x155c070,
          config=0x163e890 ",cache_size=214MB,eviction=(threads_max=18,threads_min=16)") at ../src/conn/conn_api.c:1143
      #9  0x0000000000405614 in reconfig (config=0x163e890 ",cache_size=214MB,eviction=(threads_max=18,threads_min=16)",
          session=0x15731b0, opts=0x7ffc9e5271f0) at ../../../test/csuite/wt2719_reconfig/main.c:221
      #10 main (argc=<optimized out>, argv=<optimized out>) at ../../../test/csuite/wt2719_reconfig/main.c:300

      There was a previous attempt to fix this in WT-3078.

      The failure is:

            keith.bostic@mongodb.com Keith Bostic (Inactive)
            alexander.gorrod@mongodb.com Alexander Gorrod
            0 Vote for this issue
            5 Start watching this issue