Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-440

Hang under LevelDB benchmark "small" config with 4 or more threads.

    • Type: Icon: Task Task
    • Resolution: Done
    • WT1.5.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Hi guys,

      I got some more info on why we hang when running the "small" config of LevelDB Bench with four or more threads.

      To reproduce:

      env LD_LIBRARY_PATH=../wt-dev-branch/build_posix/.libs:../wt-dev-branch/build_posix /ext/compressors/snappy/.libs/ TEST_TMPDIR= ./db_bench_wiredtiger --cache_size=6537216 --threads=4 --db=/tmpfs/leveldb --benchmarks=fillrandom,overwrite,readrandom

      The benchmark goes fine through the first two phases (fillrandom and overwrite), but then goes into an infinite loop in readrandom.

      Examining the stack trace gives me the following:

      WT-1 0x00007ffff795fdf9 in __wt_cache_full_check (session=0x6499b0) at ../src/include/cache.i:69
      WT-2 __wt_page_in_func (session=0x6499b0, parent=0x7fffe4308a10, ref=0x7fffe4308ad0) at ../src/btree/bt_page.c:42
      WT-3 0x00007ffff7984e40 in __wt_row_search (session=0x6499b0, cbt=0x7fffe41d5ed0, is_modify=0) at ../src/btree/row_srch.c:174
      WT-4 0x00007ffff795301d in __wt_btcur_search (cbt=0x7fffe41d5ed0) at ../src/btree/bt_cursor.c:146
      WT-5 0x00007ffff798e052 in __curfile_search (cursor=0x7fffe41d5ed0) at ../src/cursor/cur_file.c:133
      WT-6 0x00007ffff7995b81 in __clsm_search (cursor=0x7fffe41caa50) at ../src/lsm/lsm_cursor.c:581
      WT-7 0x0000000000404077 in leveldb::Benchmark::ReadRandom(leveldb::(anonymous namespace)::ThreadState*) ()
      WT-8 0x000000000040897e in leveldb::Benchmark::ThreadBody(void*) ()
      WT-9 0x0000000000432a3a in leveldb::(anonymous namespace)::StartThreadWrapper(void*) ()
      WT-10 0x00007ffff6f07d86 in start_thread () from /lib64/libpthread.so.0
      WT-11 0x00007ffff6c4066d in clone () from /lib64/libc.so.6

      I checked every thread, and every one of them is stuck in the same place in an infinite loop. If we look at the code where we loop, we see the following:

      for (wake = 0;; wake = (wake + 1) % 100)

      { WT_RET(__wt_eviction_check(session, &lockout, wake == 0)); if (!lockout || F_ISSET(session, WT_SESSION_NO_CACHE_CHECK | WT_SESSION_SCHEMA_LOCKED)) return (0); if (F_ISSET(btree, WT_BTREE_BULK | WT_BTREE_NO_CACHE | WT_BTREE_NO_EVICTION)) return (0); if ((ret = __wt_evict_lru_page(session, 1)) == EBUSY) __wt_yield(); else WT_RET_NOTFOUND_OK(ret); }

      What happens is that we drop all the way down to the last "else" statement, but don't return, because the ret value is actually WT_RET_NOTFOUND, so we stay in the loop forever. It appears that we can't find a page to evict.

            Unassigned Unassigned
            fedorova Alexandra (Sasha) Fedorova
            0 Vote for this issue
            1 Start watching this issue