Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1221

format stress test failure: bt_discard.c, 56: !__wt_page_is_modified(page)

    • Type: Icon: Task Task
    • Resolution: Done
    • None
    • Affects Version/s: None
    • Component/s: None

      I saw a stress test failure a few days ago and it seems to reproduce relatively quickly on pixiebob. Here's the CONFIG:

      ############################################
      #  RUN PARAMETERS
      ############################################
      auto_throttle=1
      firstfit=1
      bitcnt=3
      bloom=1
      bloom_bit_count=10
      bloom_hash_count=24
      bloom_oldest=1
      cache=15
      checkpoints=1
      checksum=uncompressed
      chunk_size=7
      compaction=0
      compression=zlib
      data_extend=0
      data_source=file
      delete_pct=22
      dictionary=0
      evict_max=0
      file_type=row-store
      backups=0
      huffman_key=0
      huffman_value=0
      insert_pct=13
      internal_key_truncation=1
      internal_page_max=9
      isolation=read-committed
      key_gap=14
      key_max=256
      key_min=256
      leak_memory=0
      leaf_page_max=9
      logging=0
      merge_max=4
      merge_threads=3
      mmap=1
      ops=100000
      prefix_compression=1
      prefix_compression_min=3
      repeat_data_pct=7
      reverse=0
      rows=100000
      runs=100
      split_pct=72
      statistics=0
      threads=13
      value_max=2638
      value_min=10
      wiredtiger_config=
      write_pct=36
      ############################################
      

      Looking at the threads:

      T 12: main, sleeping
      T 11: eviction server, the failing thread
      T 10: sweep server, sleeping
      T 3, 4, 5, 6, 7, 8, 9: worker threads, trying to do a named checkpoint and waiting  on format's hot-backup lock
      T 2: doing a named checkpoint and trying to clear the tree's current eviction point
      T 1: starting a transaction and doing a cache-full check, waiting to find a page on the eviction queue to evict.
      

      Thread 11's stack:

      WT-4  0x00000000004da528 in __wt_abort (session=0x803035280) at src/os_posix/os_abort.c:21
      WT-5  0x0000000000459e3c in __wt_assert (session=0x803035280, error=0, file_name=0x531580 "src/btree/bt_discard.c", 
          line_number=56, fmt=0x522129 "%s") at src/support/err.c:452
      WT-6  0x00000000004f5077 in __wt_page_out (session=0x803035280, pagep=0x807f293a0) at src/btree/bt_discard.c:56
      WT-7  0x00000000004f4fff in __wt_ref_out (session=0x803035280, ref=0x807f293a0) at src/btree/bt_discard.c:32
      WT-8  0x0000000000489898 in __rec_page_dirty_update (session=0x803035280, ref=0x807f293a0, exclusive=0)
          at src/btree/rec_evict.c:164
      WT-9  0x0000000000488c0e in __wt_rec_evict (session=0x803035280, ref=0x807f293a0, exclusive=0) at src/btree/rec_evict.c:88
      WT-10 0x0000000000473e6c in __wt_evict_page (session=0x803035280, ref=0x807f293a0) at src/btree/bt_evict.c:616
      WT-11 0x0000000000474403 in __wt_evict_lru_page (session=0x803035280, is_app=0) at src/btree/bt_evict.c:1218
      WT-12 0x0000000000474e12 in __evict_lru_pages (session=0x803035280, is_app=0) at src/btree/bt_evict.c:701
      WT-13 0x0000000000475879 in __evict_lru (session=0x803035280, flags=3) at src/btree/bt_evict.c:794
      WT-14 0x000000000047514c in __evict_pass (session=0x803035280) at src/btree/bt_evict.c:484
      WT-15 0x000000000047352f in __evict_server (arg=0x803035280) at src/btree/bt_evict.c:162
      
      and we're attempting to discard a dirty page:
      
      52		/*
      53		 * We should never discard a dirty page, the file's current eviction
      54		 * point or a page queued for LRU eviction.
      55		 */
      56		WT_ASSERT(session, !__wt_page_is_modified(page));
      57		WT_ASSERT(session, !F_ISSET_ATOMIC(page, WT_PAGE_EVICT_LRU));
      58		WT_ASSERT(session, !F_ISSET_ATOMIC(page, WT_PAGE_SPLITTING));
      
      and it's empty:
      
      (gdb) p *page 
      $4 = {u = {intl = {recno = 34466205048, parent_ref = 0x8065c10e8, __index = 0x0}, row = {d = 0x806588d78, ins = 0x8065c10e8, 
            upd = 0x0, entries = 0}, col_fix = {recno = 34466205048, bitf = 0x8065c10e8 "", entries = 0}, col_var = {
            recno = 34466205048, d = 0x8065c10e8, repeats = 0x0, nrepeats = 0, entries = 0}}, dsk = 0x0, modify = 0x806597520, 
        read_gen = 130743, memory_footprint = 3195, type = 7 '\a', flags_atomic = 0 '\0'}
      
      (gdb) p *page->modify
      $6 = {disk_snap_min = 0, rec_max_txn = 0, first_dirty_txn = 91156, update_txn = 91156, bytes_dirty = 3195, u1 = {replace = {
            addr = 0x0, size = 0 '\0', type = 2 '\002', reuse = 0 '\0'}, m = {multi = 0x0, multi_entries = 512}}, u2 = {intl = {
            root_split = 0x0, split_txn = 0}, leaf = {append = 0x0, update = 0x0}}, ovfl_track = 0x807704b00, write_gen = 1, 
        page_lock = 2 '\002', flags = 1 '\001'}
      
      but there's one element on the page's insert list:
      
      0x806588d30: row-store leaf
              disk 0x0, entries 0, dirty, empty, write generation=1
              insert {0000078100.00/opqrstuvwxyzabcdefghijklmnop ... }
              value {0000078100/LMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQR ... }
      
      (gdb) skip_insert_single page->u.row.ins 0
      ... printing level 0
      $12 = {upd = 0x8067e2800,
      u = {recno = 1099511627800, key = {offset = 24, size = 256}}, next = 0x80670d990}
      (gdb) p *$12.upd
      $13 = {txnid = 91156, next = 0x0, size = 2407}
      

      This page wasn't a child page of another page being evicted, this page was selected for eviction.

            Assignee:
            Unassigned Unassigned
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: