-
Type:
Bug
-
Resolution: Cannot Reproduce
-
Priority:
Major - P3
-
None
-
Affects Version/s: WT2.5.0, WT2.6.1
-
Component/s: None
-
None
Hi!
We got two similar segmentation fault crashes in WT_SESSION::checkpoint() function, for 2.5.0 and 2.6.1 versions. First stack looks the next:
#4 <signal handler called> #5 __rec_txn_read (session=0x7ffff119e200, r=0x7fffee465c00, ins=0x7fff31452880, rip=0x0, vpack=0x0, updp=0x7fffef3fb9a0) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:861 #6 0x00007ffff1ae7696 in __rec_row_leaf_insert (session=0x7ffff119e200, r=0x7fffee465c00, ins=<optimized out>) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4694 #7 0x00007ffff1adfa09 in __rec_row_leaf (session=0x7ffff119e200, r=<optimized out>, page=<optimized out>, salvage=<optimized out>) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4664 #8 __wt_reconcile (session=0x7ffff119e200, ref=0x7fff42e22610, salvage=<optimized out>, flags=0) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:413 #9 0x00007ffff1a774e4 in __wt_cache_op (session=<optimized out>, ckptbase=<optimized out>, op=<optimized out>) at /wiredtiger/2.6.1/src/src/btree/bt_sync.c:77 #10 0x00007ffff1b059d3 in __checkpoint_apply (session=<optimized out>, cfg=<optimized out>, op=<optimized out>) at /wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:184 #11 0x00007ffff1b0486a in __wt_txn_checkpoint (session=<optimized out>, cfg=<optimized out>) at /wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:405 #12 0x00007ffff1afbb28 in __session_checkpoint (wt_session=<optimized out>, config=<optimized out>) at /wiredtiger/2.6.1/src/src/session/session_api.c:997
Some gdb prints:
(gdb) f 5 #5 __rec_txn_read (session=0x7ffff119e200, r=0x7fffee465c00, ins=0x7fff31452880, rip=0x0, vpack=0x0, updp=0x7fffef3fb9a0) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:861 (gdb) x/5i $pc => 0x7ffff1ae3c00 <__rec_txn_read+192>: mov (%rbx),%rax 0x7ffff1ae3c03 <__rec_txn_read+195>: cmp $0xffffffffffffffff,%rax 0x7ffff1ae3c07 <__rec_txn_read+199>: je 0x7ffff1ae3dc0 <__rec_txn_read+640> 0x7ffff1ae3c0d <__rec_txn_read+205>: cmp %rax,%r13 0x7ffff1ae3c10 <__rec_txn_read+208>: cmovb %rax,%r13 (gdb) p *upd $1 = {txnid = 1, next = 0x200, size = 4294967295} (gdb) p /x $rbx $2 = 0x200
Second stack:
#1 <signal handler called> #2 0x00007fdc814e03c1 in __rec_txn_read (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60, rip=0x0, vpack=0x0, updp=0x7fdc64bfa750) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:849 #3 0x00007fdc814dfc2d in __rec_row_leaf_insert (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:4591 #4 0x00007fdc814d9cef in __rec_row_leaf (session=0x7fdc6db4ea00, r=0x7fdc6385b300, page=0x7fdbf767e640, salvage=0x0) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:4561 #5 0x00007fdc814d5c6d in __wt_reconcile (session=0x7fdc6db4ea00, ref=0x7fdbf7695160, salvage=0x0, flags=0) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:411 #6 0x00007fdc814439c2 in __sync_file (session=0x7fdc6db4ea00, syncop=16) at /wiredtiger/2.5.0/src/src/btree/bt_sync.c:69 #7 0x00007fdc81443720 in __wt_cache_op (session=0x7fdc6db4ea00, ckptbase=0x0, op=16) at /wiredtiger/2.5.0/src/src/btree/bt_sync.c:222 #8 0x00007fdc8150bedb in __checkpoint_write_leaves (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:279 #9 0x00007fdc8150bd40 in __checkpoint_apply (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50, op=0x7fdc8150beb0 <__checkpoint_write_leaves>) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:183 #10 0x00007fdc8150ae26 in __wt_txn_checkpoint (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:361 #11 0x00007fdc814fd1f6 in __session_checkpoint (wt_session=0x7fdc6db4ea00, config=0x0) at /wiredtiger/2.5.0/src/src/session/session_api.c:895
gdb prints:
(gdb) f 2 #2 0x00007fdc814e03c1 in __rec_txn_read (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60, rip=0x0, vpack=0x0, updp=0x7fdc64bfa750) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:849 (gdb) x/5i $pc => 0x7fdc814e03c1 <__rec_txn_read+257>: mov (%rax),%rax 0x7fdc814e03c4 <__rec_txn_read+260>: mov %rax,-0xa0(%rbp) 0x7fdc814e03cb <__rec_txn_read+267>: cmp $0xffffffffffffffff,%rax 0x7fdc814e03d1 <__rec_txn_read+273>: jne 0x7fdc814e03dc <__rec_txn_read+284> 0x7fdc814e03d7 <__rec_txn_read+279>: jmpq 0x7fdc814e04ec <__rec_txn_read+556> (gdb) p *upd Cannot access memory at address 0x28ab (gdb) p /x $rax $1 = 0x28ab
It looks like in both cores crash happens because of garbage in upd struct when taking value of upd->txnid in next code from reconcile/rec_write.c
for (max_txn = WT_TXN_NONE, min_txn = UINT64_MAX, upd = upd_list; upd != NULL; upd = upd->next) { if ((txnid = upd->txnid) == WT_TXN_ABORTED) continue;
Unfortunately, i do not know how to reproduce these crashes.