Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-938

New split issues

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Resolution: Done
    • None
    • WT2.2
    • None

    Description

      @keithbostic, here's what I've been seeing today:

      • read of pindex structures after they are freed:

      ==13645==ERROR: AddressSanitizer: heap-use-after-free on address 0x632000420800 at pc 0x790851 bp 0x7ffff42a3910 sp 0x7ffff42a3908
      READ of size 4 at 0x632000420800 thread T6
          #0 0x790850 in __wt_page_refp /home/mjc/wt/src/wiredtiger/build_posix/../src/include/btree.i:243
          WT-1 0x78bebb in __wt_tree_walk /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/bt_walk.c:242
          WT-2 0x6a8d57 in __wt_sync_file /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/bt_evict.c:604
       
      freed by thread T41 here:
          WT-2 0x620de1 in __wt_session_fotxn_discard /home/mjc/wt/src/wiredtiger/build_posix/../src/session/session_misc.c:78
          WT-3 0x62014b in __wt_session_fotxn_add /home/mjc/wt/src/wiredtiger/build_posix/../src/session/session_misc.c:42
          WT-4 0x7aa093 in __wt_split_evict /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/rec_split.c:695
      

      This is at least partly caused by paths into __wt_tree_walk that don't come through the cursor code, and so don't have a transaction (e.g., the eviction server thread). That means the "fotxn" code can free things as they are being read.

      The fix for this one is to: (a) set up a snap_min in the global transaction table for all paths into _wt_tree_walk, and (b) change fotxn_discard to use _wt_txn_visible_all to determine when items can safely be freed.

      • while testing that fix, I have seen this:

      [1396351164:764588][94892:00078ae9ff7f0000], file:test.wt, cursor.search: read checksum error [4096B @ 489816064, 1165726993 != 3615684229]
      [1396351164:764649][94892:00078ae9ff7f0000], file:test.wt, cursor.search: test.wt: encountered an illegal file format or internal value
      [1396351164:764673][94892:00078ae9ff7f0000], file:test.wt, cursor.search: aborting WiredTiger library
      

      I don't have any more information on this yet: it's just happening in an ordinary search, reading in a page.

      I'll keep running my test overnight, and clean up the "fotxn" fix tomorrow. These take a while to fire for me: things are still looking good overall.

      Attachments

        Issue Links

          Activity

            People

              keith.bostic@mongodb.com Keith Bostic (Inactive)
              michael.cahill@mongodb.com Michael Cahill
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: