Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-677

LSM cursor assert

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: WT1.6.5
    • Component/s: None
    • Labels:
      None

      Description

      In running the evict-lsm wtperf config on the big/original AWS HD system, I get the following assertion failure with the current end of develop (changeset 6fe31533cbc60171908361de260262a537debd33).

      This is happening between the populate phase and workload phase. There are only 3 threads present.

      file:test-000164.lsm: ../src/lsm/lsm_cursor.c, 400: F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) ? clsm->blooms[i] != NULL : clsm->blooms[i] == NULL
      file:test-000164.lsm: aborting WiredTiger library

      Attaching in the debugger:
      (gdb) thread apply all bt

      Thread 3 (Thread 0x7fbb1faa1700 (LWP 6394)):
      #0 0x00007fbb2076dc6b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      WT-1 0x0000000000414f29 in __wt_cond_wait (session=0x1902060, cond=0x1908ab0, usecs=100000) at ../src/os_posix/os_mtx.c:75
      WT-2 0x00000000004306c7 in __wt_cache_evict_server (arg=0x1902060) at ../src/btree/bt_evict.c:167
      WT-3 0x00007fbb20769c6b in start_thread () from /lib64/libpthread.so.0
      WT-4 0x00007fbb200a25ed in clone () from /lib64/libc.so.6

      Thread 2 (Thread 0x7fbb1da9d700 (LWP 6398)):
      #0 0x00007fbb2009b7b3 in select () from /lib64/libc.so.6
      WT-1 0x0000000000416875 in __wt_sleep (seconds=100, micro_seconds=0) at ../src/os_posix/os_sleep.c:22
      WT-2 0x0000000000426cd1 in __wt_attach (session=0x1902cc0) at ../src/support/global.c:97
      WT-3 0x000000000047700a in __wt_abort (session=0x1902cc0) at ../src/os_posix/os_abort.c:21
      WT-4 0x0000000000426721 in __wt_assert (session=0x1902cc0, error=0, file_name=0x4b1eb6 "../src/lsm/lsm_cursor.c", line_number=400,
      fmt=0x4b1eb3 "%s") at ../src/support/err.c:408
      WT-5 0x000000000046f22a in __clsm_open_cursors (clsm=0x7fbb001eb190, update=0, start_chunk=0, start_id=12) at ../src/lsm/lsm_cursor.c:399
      WT-6 0x000000000046f316 in __wt_clsm_init_merge (cursor=0x7fbb001eb190, start_chunk=0, start_id=12, nchunks=15) at ../src/lsm/lsm_cursor.c:424
      WT-7 0x0000000000473538 in __wt_lsm_merge (session=0x1902cc0, lsm_tree=0x1933040, id=0, stalls=0) at ../src/lsm/lsm_merge.c:213
      WT-8 0x00000000004108b4 in __wt_lsm_merge_worker (vargs=0x1933fe0) at ../src/lsm/lsm_worker.c:87
      WT-9 0x00007fbb20769c6b in start_thread () from /lib64/libpthread.so.0
      WT-10 0x00007fbb200a25ed in clone () from /lib64/libc.so.6

      Thread 1 (Thread 0x7fbb2121a740 (LWP 6393)):
      #0 0x00007fbb2076adaf in pthread_join () from /lib64/libpthread.so.0
      WT-1 0x0000000000416912 in __wt_thread_join (session=0x1901e50, tid=140441633281792) at ../src/os_posix/os_thread.c:35
      WT-2 0x000000000040e0be in __lsm_tree_close (session=0x1901e50, lsm_tree=0x1933040) at ../src/lsm/lsm_tree.c:87
      WT-3 0x000000000040e4f0 in __wt_lsm_tree_close_all (session=0x1901e50) at ../src/lsm/lsm_tree.c:166
      WT-4 0x000000000040c3f3 in __wt_connection_close (conn=0x19010f0) at ../src/conn/conn_open.c:83
      WT-5 0x0000000000408862 in __conn_close (wt_conn=0x19010f0, config=0x0) at ../src/conn/conn_api.c:518
      WT-6 0x000000000040520d in main (argc=5, argv=0x7fff89605828) at ../../../bench/wtperf/wtperf.c:1093

      (gdb) p/x *chunk
      $1 = {id = 0x1d, generation = 0x1, uri = 0x7fbb00050fe0, bloom_uri = 0x7fbb000990a0, count = 0x2c00e, create_ts =

      {tv_sec = 0x523094d1, tv_nsec = 0x3a202f0c}

      , txnid_max = 0x0, flags = 0x1d}

      (Flags field translates to: BLOOM | EVICTED | MERGING | ONDISK)

      (gdb) p/x *clsm
      $2 = {iface = {session = 0x1902cc0, uri = 0x19329d0, key_format = 0x1932890, value_format = 0x19328b0, get_key = 0x465d7d,
      get_value = 0x46692b, set_key = 0x465e40, set_value = 0x466c92, compare = 0x46f5f9, next = 0x46f9d6, prev = 0x46fea3, reset = 0x470423,
      search = 0x470519, search_near = 0x470a9d, insert = 0x47199c, update = 0x471eb9, remove = 0x4723be, close = 0x47275c, q =

      { tqe_next = 0x7fbb0013bf90, tqe_prev = 0x1902d88}

      , recno = 0x0, raw_recno_buf =

      {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

      , key =

      { data = 0x0, size = 0x0, flags = 0x0, mem = 0x0, memsize = 0x0}

      , value =

      {data = 0x0, size = 0x0, flags = 0x0, mem = 0x0, memsize = 0x0}

      , saved_err = 0x0, flags = 0x380}, lsm_tree = 0x1933040, dsk_gen = 0x6a2, nchunks = 0xf, nupdates = 0x0,
      blooms = 0x7fbb005fdbe0, bloom_alloc = 0x78, cursors = 0x7fbb00655570, cursor_alloc = 0x78, current = 0x0, primary_chunk = 0x0,
      txnid_max = 0x0, txnid_alloc = 0x0, flags = 0x24}

      (Flags field translates to: MERGE | OPEN_READ)

      (gdb) p i
      $3 = 1
      (gdb) x/4x clsm->blooms
      0x7fbb005fdbe0: 0x00000000 0x00000000 0x00000000 0x00000000

      That bloom filter does exist:
      rw-rw-r- 1 sue sue 196608 Sep 11 16:05 test-000140.bf
      rw-rw-r- 1 sue sue 27136000 Sep 11 16:05 test-000140.lsm
      rw-rw-r- 1 sue sue 184320 Sep 11 16:05 test-000164.bf
      rw-rw-r- 1 sue sue 25268224 Sep 11 16:05 test-000164.lsm
      rw-rw-r- 1 sue sue 143360 Sep 11 16:05 test-000171.bf
      rw-rw-r- 1 sue sue 19013632 Sep 11 16:05 test-000171.lsm

      The number of LSM files goes up into the high 800s:
      rw-rw-r- 1 sue sue 1605632 Sep 11 16:06 test-000886.lsm
      rw-rw-r- 1 sue sue 1880064 Sep 11 16:06 test-000887.lsm
      rw-rw-r- 1 sue sue 1720320 Sep 11 16:06 test-000888.lsm
      rw-rw-r- 1 sue sue 4096 Sep 11 16:06 test-000894.lsm
      rw-rw-r- 1 sue sue 151552 Sep 11 16:06 test-000895.bf
      rw-rw-r- 1 sue sue 20480000 Sep 11 16:06 test-000895.lsm

      Looking at __clsm_open_cursors, the code in the "open cursors" loop has this conditional:
      if (F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) &&
      !F_ISSET(clsm, WT_CLSM_MERGE))
      WT_ERR(__wt_bloom_open(session, chunk->bloom_uri,
      lsm_tree->bloom_bit_count,
      lsm_tree->bloom_hash_count,
      c, &clsm->blooms[i]));

      The code in the diagnostic check has:
      /_ Make sure the Bloom config matches. _/
      WT_ASSERT(session, F_ISSET(chunk, WT_LSM_CHUNK_BLOOM) ?
      clsm->blooms[i] != NULL : clsm->blooms[i] == NULL);

      The naive/obvious fix is to make the WT_ASSERT conditional match the earlier code.

      I will first test for reproducibility and then test that fix.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                sue.loverso Sue LoVerso
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: