-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
This ticket has evolved. It looks like the biggest problem is that using zlib compression we are creation a 2.5MB page on disk (when uncompressed) - the cache size is only 1MB. So a page swap with that page causes eviction to stall.
Old analysis follows:
There is a test/format workload that has hung due to a full cache. The configuration has a single worker thread and a 1MB cache.
The only application thread is:
#2 0x000000000043196e in __wt_cond_wait (session=0x2b778b0, cond=0x2b75b50,
usecs=100000) at ../src/include/misc.i:18
#3 0x0000000000435d08 in __wt_cache_eviction_worker (session=0x2b778b0,
busy=false, pct_full=328) at ../src/evict/evict_lru.c:1544
#4 0x00000000004a5802 in __wt_cache_eviction_check (session=0x2b778b0,
busy=false, didworkp=0x0) at ../src/include/cache.i:236
#5 0x00000000004a5f54 in __wt_txn_begin (session=0x2b778b0, cfg=0x0)
at ../src/include/txn.i:266
#6 0x00000000004a5fd4 in __wt_txn_autocommit_check (session=0x2b778b0)
at ../src/include/txn.i:287
#7 0x00000000004a842f in __wt_page_in_func (session=0x2b778b0, ref=0x345a5c0,
flags=0, file=0x6c2f45 "../src/btree/col_srch.c", line=93)
at ../src/btree/bt_read.c:575
#8 0x00000000004c237d in __wt_page_swap_func (session=0x2b778b0,
held=0x2b75ec0, want=0x345a5c0, flags=0,
file=0x6c2f45 "../src/btree/col_srch.c", line=93)
at ../src/include/btree.i:1260
#9 0x00000000004c2baa in __wt_col_search (session=0x2b778b0, recno=87641,
leaf=0x0, cbt=0x7f23b0032d40) at ../src/btree/col_srch.c:93
#10 0x0000000000516f4d in __cursor_col_search (session=0x2b778b0,
cbt=0x7f23b0032d40, leaf=0x0) at ../src/btree/bt_cursor.c:226
#11 0x0000000000518821 in __wt_btcur_remove (cbt=0x7f23b0032d40)
at ../src/btree/bt_cursor.c:670
#12 0x00000000004da976 in __curfile_remove (cursor=0x7f23b0032d40)
at ../src/cursor/cur_file.c:331
---Type <return> to continue, or q <return> to quit---
#13 0x00000000004131fa in col_remove (cursor=0x7f23b0032d40,
key=0x7f23c891cdf0, keyno=87641, notfoundp=0x7f23c891cda4)
at ../../../test/format/ops.c:1183
#14 0x00000000004115fa in ops (arg=0x255cd50) at ../../../test/format/ops.c:426
i.e: It is an auto-commit transaction that is doing a cache full check before allocating an ID.
There are 5 pages in cache, 848 bytes of them on internal pages. Two pages belong to the file:wt. One is a small internal page, the other is a 2.5MB leaf page.
An oddity is that the session that is in __wt_txn_begin already has a snapshot allocated:
(gdb) p session->txn
$30 = {id = 0, isolation = WT_ISO_SNAPSHOT, snap_min = 4, snap_max = 4,
snapshot = 0x7f23b0032960, snapshot_count = 0, txn_logsync = 0, mod = 0x0,
mod_alloc = 0, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {
file = 0, offset = 0}, full_ckpt = false, ckpt_nsnapshot = 0,
ckpt_snapshot = 0x0, flags = 8}
Which is keeping the system wide snap_min pinned to 4:
(gdb) p $3->txn_global
$22 = {current = 4, last_running = 4, oldest_id = 4, scan_count = 0,
checkpoint_id = 0, checkpoint_gen = 0, checkpoint_pinned = 0,
nsnap_rwlock = 0x2b731b0, nsnap_oldest_id = 0, nsnaph = {tqh_first = 0x0,
tqh_last = 0x2b6b478}, states = 0x2b94d80}
- is depended on by
-
SERVER-22146 WiredTiger changes for 3.3.1
-
- Closed
-