Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-899

test/format segfault WT-2

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.2
    • Affects Version/s: None
    • Component/s: None

      Hopefully all of these are caused by a few underlying bugs that are just manifesting in different ways. Jenkins wiredtiger-test-format-stress took a segfault that is different than WT-896 and WT-898. Here's the CONFIG:

      ############################################
      #  RUN PARAMETERS
      ############################################
      auto_throttle=1
      firstfit=0
      # bitcnt not applicable to this run
      bloom=1
      bloom_bit_count=55
      bloom_hash_count=5
      bloom_oldest=1
      cache=95
      checksum=uncompressed
      chunk_size=1
      compaction=0
      compression=none
      data_extend=0
      data_source=table
      delete_pct=2
      dictionary=0
      file_type=row-store
      hot_backups=0
      huffman_key=0
      huffman_value=0
      insert_pct=23
      internal_key_truncation=1
      internal_page_max=15
      key_gap=0
      key_max=127
      key_min=12
      leaf_page_max=9
      merge_max=5
      merge_threads=2
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=4
      repeat_data_pct=66
      reverse=0
      rows=100000
      runs=100
      split_pct=59
      statistics=0
      threads=27
      value_max=1461
      value_min=16
      # wiredtiger_config not applicable to this run
      write_pct=55
      ############################################
      

      Here's the stack:

      (gdb) bt
      #0  0x00000000004704c3 in __rec_row_int (session=0xcf10c0, r=0x7f627001c5e0, 
          page=0x7f6264016210) at ../src/btree/rec_write.c:3141
      WT-1  0x000000000046afd9 in __wt_rec_write (session=0xcf10c0, 
          page=0x7f6264016210, salvage=0x0, flags=0) at ../src/btree/rec_write.c:362
      WT-2  0x000000000044ff0d in __wt_sync_file (session=0xcf10c0, syncop=8)
          at ../src/btree/bt_evict.c:649
      WT-3  0x000000000045fb68 in __wt_bt_cache_op (session=0xcf10c0, 
          ckptbase=0x7f6270012a00, op=8) at ../src/btree/bt_sync.c:53
      WT-4  0x0000000000449ad7 in __checkpoint_worker (session=0xcf10c0, 
          cfg=0x7f62875fdcf0, is_checkpoint=1) at ../src/txn/txn_ckpt.c:750
      WT-5  0x0000000000449ce1 in __wt_checkpoint (session=0xcf10c0, 
          cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:802
      WT-6  0x0000000000497c6e in __wt_meta_btree_apply (session=0xcf10c0, 
          func=0x449c79 <__wt_checkpoint>, cfg=0x7f62875fdcf0)
          at ../src/meta/meta_apply.c:45
      WT-7  0x0000000000448838 in __checkpoint_apply (session=0xcf10c0, 
          cfg=0x7f62875fdcf0, op=0x449c79 <__wt_checkpoint>, fullp=0x0)
          at ../src/txn/txn_ckpt.c:129
      WT-8  0x0000000000448b1e in __wt_txn_checkpoint (session=0xcf10c0, 
          cfg=0x7f62875fdcf0) at ../src/txn/txn_ckpt.c:243
      WT-9  0x000000000043fe2d in __session_checkpoint (wt_session=0xcf10c0, 
          config=0x7f62875fdd80 "name=thread-6") at ../src/session/session_api.c:716
      WT-10 0x000000000040fb9c in ops (arg=0xd09890) at ../../../test/format/ops.c:272
      WT-11 0x0000003789207851 in start_thread () from /lib64/libpthread.so.0
      WT-12 0x0000003788ae767d in clone () from /lib64/libc.so.6
      

      The addr is NULL and we're dereferencing it at line 3141:

      (gdb) list
      3136			 * cell type has been set in the case of page deletion requiring
      3137			 * a proxy cell, otherwise use the information from the addr or
      3138			 * original cell.
      3139			 */
      3140			if (__wt_off_page(page, addr)) {
      3141				p = addr->addr;
      3142				size = addr->size;
      3143				if (vtype == 0)
      3144					vtype = __rec_vtype(addr);
      3145			} else {
      (gdb) p *page
      $1 = {parent = 0x7f6288003610, ref = 0x7f62880037f8, u = {intl = {recno = 0, 
            t = 0x7f6264016268}, row = {d = 0x0, ins = 0x7f6264016268, upd = 0x0}, 
          col_fix = {recno = 0, bitf = 0x7f6264016268 " \233\070$b\177"}, col_var = {
            recno = 0, d = 0x7f6264016268, repeats = 0x0, nrepeats = 0}}, 
        dsk = 0x7f6264011600, modify = 0x7f62348d6c60, read_gen = 101, 
        memory_footprint = 32341, entries = 317, type = 6 '\006', 
        flags_atomic = 2 '\002'}
      (gdb) p *addr
      Cannot access memory at address 0x0
      

      It is interesting to note that addr which is NULL and does not change AFAICT, comes from ref->addr and that is not NULL when we crash. Also rp comes from ref->page contains an address, but ref->page is NULL when we crash. Perhaps we're racing someone else modifying this.

      (gdb) p *ref
      $2 = {page = 0x0, addr = 0x7f620802f920, key = {recno = 38654723707, 
          ikey = 0x90000467b, pkey = 38654723707}, txnid = 0, state = WT_REF_DISK, 
        unused = 0}
      (gdb) p rp
      $7 = (WT_PAGE *) 0x7f622801ed20
      (gdb) p *rp
      $8 = {parent = 0xabababababababab, ref = 0xabababababababab, u = {intl = {
            recno = 12370169555311111083, t = 0xabababababababab}, row = {
            d = 0xabababababababab, ins = 0xabababababababab, 
            upd = 0xabababababababab}, col_fix = {recno = 12370169555311111083, 
            bitf = 0xabababababababab <Address 0xabababababababab out of bounds>}, 
          col_var = {recno = 12370169555311111083, d = 0xabababababababab, 
            repeats = 0xabababababababab, nrepeats = 2880154539}}, 
        dsk = 0xabababababababab, modify = 0xabababababababab, 
        read_gen = 12370169555311111083, memory_footprint = 12370169555311111083, 
        entries = 2880154539, type = 171 '\253', flags_atomic = 171 '\253'}
      

      Most application threads are waiting on an application lock g.backup_lock.
      The eviction server has an interesting stack. Perhaps we raced eviction:

      Thread 12 (Thread 0x7f628f7c1700 (LWP 32662)):
      #0  __wt_page_hazard_check (session=0xcefa40, page=0x7f625c1a8c10)
          at ../src/include/btree.i:732
      WT-1  0x00000000004666e7 in __hazard_exclusive (session=0xcefa40, 
          ref=0x7f6264017ac8, top=1) at ../src/btree/rec_evict.c:562
      WT-2  0x00000000004660c1 in __rec_review (session=0xcefa40, ref=0x7f6264017ac8, 
          page=0x7f625c1a8c10, exclusive=0, merge=0, top=1, 
          inmem_split=0x7f628f7c0c90, istree=0x7f628f7c0c8c)
          at ../src/btree/rec_evict.c:306
      WT-3  0x000000000046593b in __wt_rec_evict (session=0xcefa40, 
          page=0x7f625c1a8c10, exclusive=0) at ../src/btree/rec_evict.c:62
      WT-4  0x000000000044f933 in __wt_evict_page (session=0xcefa40, 
          page=0x7f625c1a8c10) at ../src/btree/bt_evict.c:403
      WT-5  0x0000000000450f89 in __wt_evict_lru_page (session=0xcefa40, is_app=0)
          at ../src/btree/bt_evict.c:1242
      WT-6  0x00000000004502d9 in __evict_lru (session=0xcefa40, flags=2)
          at ../src/btree/bt_evict.c:762
      WT-7  0x000000000044f595 in __evict_worker (session=0xcefa40)
          at ../src/btree/bt_evict.c:272
      WT-8  0x000000000044f104 in __wt_cache_evict_server (arg=0xcefa40)
          at ../src/btree/bt_evict.c:164
      WT-9  0x0000003789207851 in start_thread () from /lib64/libpthread.so.0
      WT-10 0x0000003788ae767d in clone () from /lib64/libc.so.6
      

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: