Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-938

New split issues

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.2
    • Affects Version/s: None
    • Component/s: None

      @keithbostic, here's what I've been seeing today:

      • read of pindex structures after they are freed:
      ==13645==ERROR: AddressSanitizer: heap-use-after-free on address 0x632000420800 at pc 0x790851 bp 0x7ffff42a3910 sp 0x7ffff42a3908
      READ of size 4 at 0x632000420800 thread T6
          #0 0x790850 in __wt_page_refp /home/mjc/wt/src/wiredtiger/build_posix/../src/include/btree.i:243
          WT-1 0x78bebb in __wt_tree_walk /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/bt_walk.c:242
          WT-2 0x6a8d57 in __wt_sync_file /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/bt_evict.c:604
      
      freed by thread T41 here:
          WT-2 0x620de1 in __wt_session_fotxn_discard /home/mjc/wt/src/wiredtiger/build_posix/../src/session/session_misc.c:78
          WT-3 0x62014b in __wt_session_fotxn_add /home/mjc/wt/src/wiredtiger/build_posix/../src/session/session_misc.c:42
          WT-4 0x7aa093 in __wt_split_evict /home/mjc/wt/src/wiredtiger/build_posix/../src/btree/rec_split.c:695
      

      This is at least partly caused by paths into __wt_tree_walk that don't come through the cursor code, and so don't have a transaction (e.g., the eviction server thread). That means the "fotxn" code can free things as they are being read.

      The fix for this one is to: (a) set up a snap_min in the global transaction table for all paths into _wt_tree_walk, and (b) change fotxn_discard to use _wt_txn_visible_all to determine when items can safely be freed.

      • while testing that fix, I have seen this:
      [1396351164:764588][94892:00078ae9ff7f0000], file:test.wt, cursor.search: read checksum error [4096B @ 489816064, 1165726993 != 3615684229]
      [1396351164:764649][94892:00078ae9ff7f0000], file:test.wt, cursor.search: test.wt: encountered an illegal file format or internal value
      [1396351164:764673][94892:00078ae9ff7f0000], file:test.wt, cursor.search: aborting WiredTiger library
      

      I don't have any more information on this yet: it's just happening in an ordinary search, reading in a page.

      I'll keep running my test overnight, and clean up the "fotxn" fix tomorrow. These take a while to fire for me: things are still looking good overall.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: