Description
The default parallel-pop-lsm runner works, but if you increase the number of populate threads to 10, it fails for me on pixiebob.
# wtperf options file: Run populate thread multi-threaded and with groups
|
# of operations in each transaction.
|
conn_config="cache_size=200MB"
|
table_config="lsm_chunk_size=1M,type=lsm"
|
transaction_config="isolation=snapshot"
|
icount=10000000
|
report_interval=5
|
stat_interval=4
|
run_time=20
|
populate_ops_per_txn=100
|
populate_threads=10
|
verbose=1
|
Here are the stacks:
thread 15 execute_populate
|
thread 14 eviction server
|
thread 13 __wt_lsm_stat_init (waiting on LSM lock)
|
thread 12 failing thread
|
thread 11 failing thread
|
thread 1, 2, 3, 4, 5, 6, 7, 8, 9 10
|
__clsm_put sleeping
|
while (clsm->dsk_gen == lsm_tree->dsk_gen)
|
__wt_sleep(0, 10);
|
Thread 12:
WT-4 0x0000000000478398 in __wt_abort (session=0x80204ba28)
|
at ../src/os_posix/os_abort.c:21
|
WT-5 0x0000000000426597 in __wt_assert (session=Could not find the frame base for "__wt_assert".
|
) at ../src/support/err.c:408
|
WT-6 0x0000000000411f44 in __lsm_free_chunks (session=0x80204ba28,
|
lsm_tree=0x8023de600) at ../src/lsm/lsm_worker.c:621
|
WT-7 0x0000000000410b58 in __wt_lsm_merge_worker (vargs=0x80201e450)
|
at ../src/lsm/lsm_worker.c:127
|
|
(gdb) frame 6
|
WT-6 0x0000000000411f44 in __lsm_free_chunks (session=0x80204ba28,
|
lsm_tree=0x8023de600) at ../src/lsm/lsm_worker.c:621
|
621 WT_ASSERT(session, lsm_tree->old_chunks[skipped] == chunk);
|
|
(gdb) p cookie
|
$13 = {chunk_array = 0x802619c00, chunk_alloc = 1280, nchunks = 82}
|
(gdb) p i
|
$14 = 75
|
(gdb) p skipped
|
$15 = 0
|
(gdb) p progress
|
$16 = 1
|
(gdb) p chunk
|
$99 = (WT_LSM_CHUNK *) 0x802728ce0
|
(gdb) p *chunk
|
$100 =
|
{id = 269, generation = 1, uri = 0x8027e1540 "file:test-000269.lsm",
|
bloom_uri = 0x8027ed460 "file:test-000269.bf", count = 135834,
|
create_ts = { tv_sec = 1381325195, tv_nsec = 118098271},
|
refcnt = 1, txnid_max = 0, flags = 24}
|
If I look at the list of chunks in the cookie, all of them have a refcnt of 2 except for the chunk we're looking at.
OK, I think the problem here is that we're not incrementing skipped if we continue in the loop because chunk->refcnt > 1.
Thread 11:
(gdb) where
|
WT-5 0x0000000000426597 in __wt_assert (session=Could not find the frame base for "__wt_assert".
|
) at ../src/support/err.c:408
|
WT-6 0x0000000000411b1d in __lsm_discard_handle (session=0x80204b820,
|
uri=0x805ff32c0 "file:test-000243.lsm", checkpoint=0x0)
|
at ../src/lsm/lsm_worker.c:491
|
WT-7 0x000000000041109f in __wt_lsm_checkpoint_worker (arg=0x8023de600)
|
at ../src/lsm/lsm_worker.c:295
|
|
(gdb) frame 6
|
WT-6 0x0000000000411b1d in __lsm_discard_handle (session=0x80204b820,
|
uri=0x805ff32c0 "file:test-000243.lsm", checkpoint=0x0)
|
at ../src/lsm/lsm_worker.c:491
|
491 WT_ASSERT(session, S2BT(session)->modified == 0);
|
|
(gdb) p ((WT_BTREE *)session->dhandle->handle)->modified
|
$191 = 1
|
(gdb) p session->dhandle->name
|
$192 = 0x805ff3340 "file:test-000243.lsm"
|
|
(gdb) frame 7
|
WT-7 0x000000000041109f in __wt_lsm_checkpoint_worker (arg=0x8023de600)
|
at ../src/lsm/lsm_worker.c:295
|
295 if ((ret = __lsm_discard_handle(
|
(gdb) p *chunk
|
$193 = {id = 243, generation = 0, uri = 0x805ff32c0 "file:test-000243.lsm",
|
bloom_uri = 0x0, count = 10292, create_ts = {tv_sec = 1381325191,
|
tv_nsec = 118344315}, refcnt = 2, txnid_max = 19371, flags = 24}
|
(gdb) p chunk->flags & 0x10
|
$194 = 16
|
(gdb) p chunk->flags & 0x04
|
$195 = 0
|
So, we're discarding a chunk, that chunk is WT_LSM_CHUNK_ONDISK, but not WT_LSM_CHUNK_EVICTED, and we're concerned that the btree handle's modified flag is set.
Attachments
Issue Links
- related to
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-13 Define supported platforms
- Closed
-
WT-14 Windows build
- Closed
-
WT-15 Automated build/test infrastructure
- Closed
-
WT-16 Test suite
- Closed
-
WT-17 Multithreaded tests
- Closed
-
WT-18 Coverage tests
- Closed
-
WT-19 Memory access / leak tests
- Closed
-
WT-20 API design
- Closed
-
WT-21 Record numbers in row stores
- Closed