Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2351

Segmentation fault during WT_SESSION::checkpoint

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: WT2.5.0, WT2.6.1
    • Component/s: None
    • Labels:
      None

      Hi!
      We got two similar segmentation fault crashes in WT_SESSION::checkpoint() function, for 2.5.0 and 2.6.1 versions. First stack looks the next:

      #4  <signal handler called>
      #5  __rec_txn_read (session=0x7ffff119e200, r=0x7fffee465c00, ins=0x7fff31452880, rip=0x0, vpack=0x0, updp=0x7fffef3fb9a0)   at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:861
      #6  0x00007ffff1ae7696 in __rec_row_leaf_insert (session=0x7ffff119e200, r=0x7fffee465c00, ins=<optimized out>)   at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4694
      #7  0x00007ffff1adfa09 in __rec_row_leaf (session=0x7ffff119e200, r=<optimized out>, page=<optimized out>, salvage=<optimized out>)   at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4664
      #8  __wt_reconcile (session=0x7ffff119e200, ref=0x7fff42e22610, salvage=<optimized out>, flags=0) at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:413
      #9  0x00007ffff1a774e4 in __wt_cache_op (session=<optimized out>, ckptbase=<optimized out>, op=<optimized out>) at /wiredtiger/2.6.1/src/src/btree/bt_sync.c:77
      #10 0x00007ffff1b059d3 in __checkpoint_apply (session=<optimized out>, cfg=<optimized out>, op=<optimized out>) at /wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:184
      #11 0x00007ffff1b0486a in __wt_txn_checkpoint (session=<optimized out>, cfg=<optimized out>) at /wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:405
      #12 0x00007ffff1afbb28 in __session_checkpoint (wt_session=<optimized out>, config=<optimized out>) at /wiredtiger/2.6.1/src/src/session/session_api.c:997
      

      Some gdb prints:

      (gdb) f 5
      #5  __rec_txn_read (session=0x7ffff119e200, r=0x7fffee465c00, ins=0x7fff31452880, rip=0x0, vpack=0x0, updp=0x7fffef3fb9a0)
          at /wiredtiger/2.6.1/src/src/reconcile/rec_write.c:861
      (gdb) x/5i $pc
      => 0x7ffff1ae3c00 <__rec_txn_read+192>:	mov    (%rbx),%rax
         0x7ffff1ae3c03 <__rec_txn_read+195>:	cmp    $0xffffffffffffffff,%rax
         0x7ffff1ae3c07 <__rec_txn_read+199>:	je     0x7ffff1ae3dc0 <__rec_txn_read+640>
         0x7ffff1ae3c0d <__rec_txn_read+205>:	cmp    %rax,%r13
         0x7ffff1ae3c10 <__rec_txn_read+208>:	cmovb  %rax,%r13
      (gdb) p *upd
      $1 = {txnid = 1, next = 0x200, size = 4294967295}
      (gdb) p /x $rbx
      $2 = 0x200
      

      Second stack:

      #1  <signal handler called>
      #2  0x00007fdc814e03c1 in __rec_txn_read (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60, rip=0x0, vpack=0x0, updp=0x7fdc64bfa750) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:849
      #3  0x00007fdc814dfc2d in __rec_row_leaf_insert (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:4591
      #4  0x00007fdc814d9cef in __rec_row_leaf (session=0x7fdc6db4ea00, r=0x7fdc6385b300, page=0x7fdbf767e640, salvage=0x0) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:4561
      #5  0x00007fdc814d5c6d in __wt_reconcile (session=0x7fdc6db4ea00, ref=0x7fdbf7695160, salvage=0x0, flags=0) at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:411
      #6  0x00007fdc814439c2 in __sync_file (session=0x7fdc6db4ea00, syncop=16) at /wiredtiger/2.5.0/src/src/btree/bt_sync.c:69
      #7  0x00007fdc81443720 in __wt_cache_op (session=0x7fdc6db4ea00, ckptbase=0x0, op=16) at /wiredtiger/2.5.0/src/src/btree/bt_sync.c:222
      #8  0x00007fdc8150bedb in __checkpoint_write_leaves (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:279
      #9  0x00007fdc8150bd40 in __checkpoint_apply (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50, op=0x7fdc8150beb0 <__checkpoint_write_leaves>) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:183
      #10 0x00007fdc8150ae26 in __wt_txn_checkpoint (session=0x7fdc6db4ea00, cfg=0x7fdc64bfac50) at /wiredtiger/2.5.0/src/src/txn/txn_ckpt.c:361
      #11 0x00007fdc814fd1f6 in __session_checkpoint (wt_session=0x7fdc6db4ea00, config=0x0) at /wiredtiger/2.5.0/src/src/session/session_api.c:895
      

      gdb prints:

      (gdb) f 2
      #2  0x00007fdc814e03c1 in __rec_txn_read (session=0x7fdc6db4ea00, r=0x7fdc6385b300, ins=0x7fdbfc78bd60, rip=0x0, vpack=0x0, updp=0x7fdc64bfa750)
          at /wiredtiger/2.5.0/src/src/reconcile/rec_write.c:849
      (gdb) x/5i $pc
      => 0x7fdc814e03c1 <__rec_txn_read+257>:	mov    (%rax),%rax
         0x7fdc814e03c4 <__rec_txn_read+260>:	mov    %rax,-0xa0(%rbp)
         0x7fdc814e03cb <__rec_txn_read+267>:	cmp    $0xffffffffffffffff,%rax
         0x7fdc814e03d1 <__rec_txn_read+273>:	jne    0x7fdc814e03dc <__rec_txn_read+284>
         0x7fdc814e03d7 <__rec_txn_read+279>:	jmpq   0x7fdc814e04ec <__rec_txn_read+556>
      (gdb) p *upd
      Cannot access memory at address 0x28ab
      (gdb) p /x $rax
      $1 = 0x28ab
      

      It looks like in both cores crash happens because of garbage in upd struct when taking value of upd->txnid in next code from reconcile/rec_write.c

      for (max_txn = WT_TXN_NONE, min_txn = UINT64_MAX, upd = upd_list;                                                                                                                                     
               upd != NULL; upd = upd->next) {                                                                                                                                                                   
               if ((txnid = upd->txnid) == WT_TXN_ABORTED)                                                                                                                                                   
                   continue;   
      

      Unfortunately, i do not know how to reproduce these crashes.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            Denis Shkirya Denis Shkirya
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: