Details
-
Task
-
Status: Closed
-
Resolution: Done
-
None
-
None
-
None
Description
I saw a stress test failure a few days ago and it seems to reproduce relatively quickly on pixiebob. Here's the CONFIG:
############################################
|
# RUN PARAMETERS
|
############################################
|
auto_throttle=1
|
firstfit=1
|
bitcnt=3
|
bloom=1
|
bloom_bit_count=10
|
bloom_hash_count=24
|
bloom_oldest=1
|
cache=15
|
checkpoints=1
|
checksum=uncompressed
|
chunk_size=7
|
compaction=0
|
compression=zlib
|
data_extend=0
|
data_source=file
|
delete_pct=22
|
dictionary=0
|
evict_max=0
|
file_type=row-store
|
backups=0
|
huffman_key=0
|
huffman_value=0
|
insert_pct=13
|
internal_key_truncation=1
|
internal_page_max=9
|
isolation=read-committed
|
key_gap=14
|
key_max=256
|
key_min=256
|
leak_memory=0
|
leaf_page_max=9
|
logging=0
|
merge_max=4
|
merge_threads=3
|
mmap=1
|
ops=100000
|
prefix_compression=1
|
prefix_compression_min=3
|
repeat_data_pct=7
|
reverse=0
|
rows=100000
|
runs=100
|
split_pct=72
|
statistics=0
|
threads=13
|
value_max=2638
|
value_min=10
|
wiredtiger_config=
|
write_pct=36
|
############################################
|
Looking at the threads:
T 12: main, sleeping
|
T 11: eviction server, the failing thread
|
T 10: sweep server, sleeping
|
T 3, 4, 5, 6, 7, 8, 9: worker threads, trying to do a named checkpoint and waiting on format's hot-backup lock
|
T 2: doing a named checkpoint and trying to clear the tree's current eviction point
|
T 1: starting a transaction and doing a cache-full check, waiting to find a page on the eviction queue to evict.
|
Thread 11's stack:
WT-4 0x00000000004da528 in __wt_abort (session=0x803035280) at src/os_posix/os_abort.c:21
|
WT-5 0x0000000000459e3c in __wt_assert (session=0x803035280, error=0, file_name=0x531580 "src/btree/bt_discard.c",
|
line_number=56, fmt=0x522129 "%s") at src/support/err.c:452
|
WT-6 0x00000000004f5077 in __wt_page_out (session=0x803035280, pagep=0x807f293a0) at src/btree/bt_discard.c:56
|
WT-7 0x00000000004f4fff in __wt_ref_out (session=0x803035280, ref=0x807f293a0) at src/btree/bt_discard.c:32
|
WT-8 0x0000000000489898 in __rec_page_dirty_update (session=0x803035280, ref=0x807f293a0, exclusive=0)
|
at src/btree/rec_evict.c:164
|
WT-9 0x0000000000488c0e in __wt_rec_evict (session=0x803035280, ref=0x807f293a0, exclusive=0) at src/btree/rec_evict.c:88
|
WT-10 0x0000000000473e6c in __wt_evict_page (session=0x803035280, ref=0x807f293a0) at src/btree/bt_evict.c:616
|
WT-11 0x0000000000474403 in __wt_evict_lru_page (session=0x803035280, is_app=0) at src/btree/bt_evict.c:1218
|
WT-12 0x0000000000474e12 in __evict_lru_pages (session=0x803035280, is_app=0) at src/btree/bt_evict.c:701
|
WT-13 0x0000000000475879 in __evict_lru (session=0x803035280, flags=3) at src/btree/bt_evict.c:794
|
WT-14 0x000000000047514c in __evict_pass (session=0x803035280) at src/btree/bt_evict.c:484
|
WT-15 0x000000000047352f in __evict_server (arg=0x803035280) at src/btree/bt_evict.c:162
|
|
and we're attempting to discard a dirty page:
|
|
52 /*
|
53 * We should never discard a dirty page, the file's current eviction
|
54 * point or a page queued for LRU eviction.
|
55 */
|
56 WT_ASSERT(session, !__wt_page_is_modified(page));
|
57 WT_ASSERT(session, !F_ISSET_ATOMIC(page, WT_PAGE_EVICT_LRU));
|
58 WT_ASSERT(session, !F_ISSET_ATOMIC(page, WT_PAGE_SPLITTING));
|
|
and it's empty:
|
|
(gdb) p *page
|
$4 = {u = {intl = {recno = 34466205048, parent_ref = 0x8065c10e8, __index = 0x0}, row = {d = 0x806588d78, ins = 0x8065c10e8,
|
upd = 0x0, entries = 0}, col_fix = {recno = 34466205048, bitf = 0x8065c10e8 "", entries = 0}, col_var = {
|
recno = 34466205048, d = 0x8065c10e8, repeats = 0x0, nrepeats = 0, entries = 0}}, dsk = 0x0, modify = 0x806597520,
|
read_gen = 130743, memory_footprint = 3195, type = 7 '\a', flags_atomic = 0 '\0'}
|
|
(gdb) p *page->modify
|
$6 = {disk_snap_min = 0, rec_max_txn = 0, first_dirty_txn = 91156, update_txn = 91156, bytes_dirty = 3195, u1 = {replace = {
|
addr = 0x0, size = 0 '\0', type = 2 '\002', reuse = 0 '\0'}, m = {multi = 0x0, multi_entries = 512}}, u2 = {intl = {
|
root_split = 0x0, split_txn = 0}, leaf = {append = 0x0, update = 0x0}}, ovfl_track = 0x807704b00, write_gen = 1,
|
page_lock = 2 '\002', flags = 1 '\001'}
|
|
but there's one element on the page's insert list:
|
|
0x806588d30: row-store leaf
|
disk 0x0, entries 0, dirty, empty, write generation=1
|
insert {0000078100.00/opqrstuvwxyzabcdefghijklmnop ... }
|
value {0000078100/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQR ... }
|
|
(gdb) skip_insert_single page->u.row.ins 0
|
... printing level 0
|
$12 = {upd = 0x8067e2800,
|
u = {recno = 1099511627800, key = {offset = 24, size = 256}}, next = 0x80670d990}
|
(gdb) p *$12.upd
|
$13 = {txnid = 91156, next = 0x0, size = 2407}
|
This page wasn't a child page of another page being evicted, this page was selected for eviction.
Attachments
Issue Links
- related to
-
WT-4 Flexible cursor traversals
- Closed
-
WT-5 How does pget work: is it necessary?
- Closed
-
WT-6 Complex schema example
- Closed
-
WT-7 Do we need the handle->err/errx methods?
- Closed
-
WT-8 Do we need table load, bulk-load and/or dump methods?
- Closed
-
WT-9 Does adding schema need to be transactional?
- Closed
-
WT-10 Basic "getting started" tutorial
- Closed
-
WT-11 placeholder #11
- Closed
-
WT-12 Write more examples
- Closed
-
WT-13 Define supported platforms
- Closed
-
WT-14 Windows build
- Closed
-
WT-15 Automated build/test infrastructure
- Closed