Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-713

wtperf parallel-pop-lsm failure

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT1.6.6
    • Labels:

      Description

      The default parallel-pop-lsm runner works, but if you increase the number of populate threads to 10, it fails for me on pixiebob.

      # wtperf options file: Run populate thread multi-threaded and with groups
      # of operations in each transaction.
      conn_config="cache_size=200MB"
      table_config="lsm_chunk_size=1M,type=lsm"
      transaction_config="isolation=snapshot"
      icount=10000000
      report_interval=5
      stat_interval=4
      run_time=20
      populate_ops_per_txn=100
      populate_threads=10
      verbose=1
      

      Here are the stacks:

      thread 15       execute_populate
      thread 14       eviction server
      thread 13       __wt_lsm_stat_init (waiting on LSM lock)
      thread 12       failing thread
      thread 11       failing thread
      thread 1, 2, 3, 4, 5, 6, 7, 8, 9 10
                      __clsm_put sleeping
                      while (clsm->dsk_gen == lsm_tree->dsk_gen)
                              __wt_sleep(0, 10);
      

      Thread 12:

      WT-4  0x0000000000478398 in __wt_abort (session=0x80204ba28)
          at ../src/os_posix/os_abort.c:21
      WT-5  0x0000000000426597 in __wt_assert (session=Could not find the frame base for "__wt_assert".
      ) at ../src/support/err.c:408
      WT-6  0x0000000000411f44 in __lsm_free_chunks (session=0x80204ba28, 
          lsm_tree=0x8023de600) at ../src/lsm/lsm_worker.c:621
      WT-7  0x0000000000410b58 in __wt_lsm_merge_worker (vargs=0x80201e450)
          at ../src/lsm/lsm_worker.c:127
       
      (gdb) frame 6
      WT-6  0x0000000000411f44 in __lsm_free_chunks (session=0x80204ba28, 
          lsm_tree=0x8023de600) at ../src/lsm/lsm_worker.c:621
      621			WT_ASSERT(session, lsm_tree->old_chunks[skipped] == chunk);
       
      (gdb) p cookie
      $13 = {chunk_array = 0x802619c00, chunk_alloc = 1280, nchunks = 82}
      (gdb) p i
      $14 = 75
      (gdb) p skipped
      $15 = 0
      (gdb) p progress
      $16 = 1
      (gdb) p chunk
      $99 = (WT_LSM_CHUNK *) 0x802728ce0
      (gdb) p *chunk
      $100 = 
      {id = 269, generation = 1, uri = 0x8027e1540 "file:test-000269.lsm",
      bloom_uri = 0x8027ed460 "file:test-000269.bf", count = 135834,
      create_ts = { tv_sec = 1381325195, tv_nsec = 118098271},
      refcnt = 1, txnid_max = 0, flags = 24}
      

      If I look at the list of chunks in the cookie, all of them have a refcnt of 2 except for the chunk we're looking at.

      OK, I think the problem here is that we're not incrementing skipped if we continue in the loop because chunk->refcnt > 1.

      Thread 11:

      (gdb) where
      WT-5  0x0000000000426597 in __wt_assert (session=Could not find the frame base for "__wt_assert".
      ) at ../src/support/err.c:408
      WT-6  0x0000000000411b1d in __lsm_discard_handle (session=0x80204b820, 
          uri=0x805ff32c0 "file:test-000243.lsm", checkpoint=0x0)
          at ../src/lsm/lsm_worker.c:491
      WT-7  0x000000000041109f in __wt_lsm_checkpoint_worker (arg=0x8023de600)
          at ../src/lsm/lsm_worker.c:295
       
      (gdb) frame 6
      WT-6  0x0000000000411b1d in __lsm_discard_handle (session=0x80204b820, 
          uri=0x805ff32c0 "file:test-000243.lsm", checkpoint=0x0)
          at ../src/lsm/lsm_worker.c:491
      491		WT_ASSERT(session, S2BT(session)->modified == 0);
       
      (gdb) p ((WT_BTREE *)session->dhandle->handle)->modified
      $191 = 1
      (gdb) p session->dhandle->name
      $192 = 0x805ff3340 "file:test-000243.lsm"
       
      (gdb) frame 7
      WT-7  0x000000000041109f in __wt_lsm_checkpoint_worker (arg=0x8023de600)
          at ../src/lsm/lsm_worker.c:295
      295					if ((ret = __lsm_discard_handle(
      (gdb) p *chunk
      $193 = {id = 243, generation = 0, uri = 0x805ff32c0 "file:test-000243.lsm", 
        bloom_uri = 0x0, count = 10292, create_ts = {tv_sec = 1381325191, 
          tv_nsec = 118344315}, refcnt = 2, txnid_max = 19371, flags = 24}
      (gdb) p chunk->flags & 0x10
      $194 = 16
      (gdb) p chunk->flags & 0x04
      $195 = 0
      

      So, we're discarding a chunk, that chunk is WT_LSM_CHUNK_ONDISK, but not WT_LSM_CHUNK_EVICTED, and we're concerned that the btree handle's modified flag is set.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                keith.bostic Keith Bostic
                Reporter:
                keith.bostic Keith Bostic
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: