Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1937

clear hazard pointer: 0x7fba540d3ba0: not found

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.6.0
    • Labels:
      None
    • # Replies:
      3
    • Last comment by Customer:
      true

      Description

      The following CONFIG file occasionally fails (on the order of 20,000 runs on bengal, I've never seen it anywhere else).

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      auto_throttle=1
      firstfit=0
      bitcnt=2
      bloom=1
      bloom_bit_count=40
      bloom_hash_count=24
      bloom_oldest=0
      cache=60
      checkpoints=1
      checksum=uncompressed
      chunk_size=7
      compaction=0
      compression=lz4
      data_extend=0
      data_source=file
      delete_pct=25
      dictionary=0
      evict_max=4
      file_type=variable-length column-store
      backups=0
      huffman_key=0
      huffman_value=0
      insert_pct=34
      internal_key_truncation=1
      internal_page_max=13
      isolation=read-committed
      key_gap=17
      key_max=76
      key_min=32
      leak_memory=0
      leaf_page_max=10
      logging=0
      logging_archive=1
      logging_prealloc=1
      lsm_worker_threads=4
      merge_max=17
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=5
      repeat_data_pct=65
      reverse=0
      rows=100000
      runs=10000
      split_pct=48
      statistics=0
      statistics_server=0
      threads=31
      timer=20
      value_max=1916
      value_min=4
      wiredtiger_config=
      write_pct=72
      ############################################
      

      Here's the stack:

      (gdb) where
      #0  0x0000003467632925 in raise () from /lib64/libc.so.6
      #1  0x0000003467634105 in abort () from /lib64/libc.so.6
      #2  0x00000000004e58db in __wt_abort (session=0x17f4fb0)
          at src/os_posix/os_abort.c:25
      #3  0x000000000047104c in __wt_panic (session=0x17f4fb0)
          at src/support/err.c:492
      #4  0x0000000000472632 in __wt_hazard_clear (session=0x17f4fb0, 
          page=0x7fba540d3ba0) at src/support/hazard.c:178
      #5  0x00000000004feef1 in __wt_page_release_evict (session=0x17f4fb0, 
          ref=0x7fba54103dc0) at ./src/include/btree.i:1139
      #6  0x00000000004ff14a in __wt_page_release (session=0x17f4fb0, 
          ref=0x7fba54103dc0, flags=0) at ./src/include/btree.i:1212
      #7  0x00000000004ff703 in __curfile_leave (cbt=0x7fba3803adb0)
          at ./src/include/cursor.i:132
      #8  0x00000000004ff7dc in __cursor_reset (cbt=0x7fba3803adb0)
          at ./src/include/cursor.i:208
      #9  0x00000000005009f1 in __wt_btcur_insert (cbt=0x7fba3803adb0)
          at src/btree/bt_cursor.c:558
      #10 0x00000000004c4e29 in __curfile_insert (cursor=0x7fba3803adb0)
          at src/cursor/cur_file.c:245
      #11 0x00000000004122f5 in col_insert (tinfo=0x18467e0, cursor=0x7fba3803adb0, 
          key=0x7fba64e99df0, value=0x7fba64e99dc0, keynop=0x7fba64e99db8)
          at ops.c:1065
      

      In summary, we have a non-NULL cbt->ref, but we're not holding a hazard reference on it.

        Issue Links

          Activity

          Hide
          keith.bostic Keith Bostic added a comment -

          This problem is somehow connected to a cursor not-found error, too. The error I see equally likely to be:

          t: cursor.insert: WT_NOTFOUND: item not found
          

          Show
          keith.bostic Keith Bostic added a comment - This problem is somehow connected to a cursor not-found error, too. The error I see equally likely to be: t: cursor.insert: WT_NOTFOUND: item not found
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

          Message: Always clear the cursor's page reference after releasing the page,
          regardless of the return, otherwise on error we try to release the
          page again, which panics if the page's hazard pointer was released
          in the first attempt. Reference WT-1937.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/0b2d32526cd5612eeb535b45a69a822bc9d9b616

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'} Message: Always clear the cursor's page reference after releasing the page, regardless of the return, otherwise on error we try to release the page again, which panics if the page's hazard pointer was released in the first attempt. Reference WT-1937 . Branch: develop https://github.com/wiredtiger/wiredtiger/commit/0b2d32526cd5612eeb535b45a69a822bc9d9b616
          Hide
          keith.bostic Keith Bostic added a comment -

          Michael Cahill, this one fired last night with some debugging information, and the problem is __curfile_leave not clearing cbt->ref after __wt_page_release returns WT_NOTFOUND.

          I've pushed a change, and am closing this one. (I'll do another set of runs, but assuming the WT_NOTFOUND return is fixed by your change in https://github.com/wiredtiger/wiredtiger/commit/98786589218e4d3f06232633e5be6a7e619fa165, I think we're done.)

          Thanks!

          Show
          keith.bostic Keith Bostic added a comment - Michael Cahill , this one fired last night with some debugging information, and the problem is __curfile_leave not clearing cbt->ref after __wt_page_release returns WT_NOTFOUND. I've pushed a change, and am closing this one. (I'll do another set of runs, but assuming the WT_NOTFOUND return is fixed by your change in https://github.com/wiredtiger/wiredtiger/commit/98786589218e4d3f06232633e5be6a7e619fa165 , I think we're done.) Thanks!

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                2 years, 5 weeks ago
                Date of 1st Reply: