Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5117

truncate can race with a truncate being aborted

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.3 Required
    • Component/s: None
    • Labels:
      None

      Description

      Chasing WT-5078, and saw a failure twice that I think is different.

      Here's the failing stacks:

      Sun Sep  8 16:03:17 UTC 2019: -c CONFIG.5078 iteration 4
      [1567959072:302857][54295:0x7f7b2effd700], t, file:wt.wt, WT_SESSION.truncate: __wt_delete_page, 113: ref->page_del->txnid == WT_TXN_ABORTED
      [1567959072:302888][54295:0x7f7b2effd700], t, file:wt.wt, WT_SESSION.truncate: __wt_abort, 22: process ID 54295: waiting for debugger...
      [1567959072:376507][54295:0x7f7b2a7f4700], t, file:wt.wt, WT_SESSION.truncate: __wt_delete_page, 113: ref->page_del->txnid == WT_TXN_ABORTED
      [1567959072:376536][54295:0x7f7b2a7f4700], t, file:wt.wt, WT_SESSION.truncate: __wt_abort, 22: process ID 54295: waiting for debugger...
       
      Thread 22 (Thread 0x7f7b2a7f4700 (LWP 56739)):
      #0  0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0,
          timeout=0x7f7b2a7f3a60) at ../sysdeps/unix/sysv/linux/select.c:41
      #1  0x000055880fa2720e in __wt_sleep (seconds=100, micro_seconds=0) at src/os_posix/os_sleep.c:30
      #2  0x000055880fa1de21 in __wt_abort (session=0x7f7b48963540) at src/os_common/os_abort.c:26
      #3  0x000055880fabdeab in __wt_delete_page (session=0x7f7b48963540, ref=0x558816989ac0,
          skipp=0x7f7b2a7f3b51) at src/btree/bt_delete.c:113
      #4  0x000055880fafe208 in __tree_walk_internal (session=0x7f7b48963540, refp=0x7f7a880d0508,
          walkcntp=0x0, skip_func=0x0, func_cookie=0x0, flags=6274) at src/btree/bt_walk.c:486
      #5  0x000055880fafe568 in __wt_tree_walk (session=0x7f7b48963540, refp=0x7f7a880d0508, flags=6272)
          at src/btree/bt_walk.c:572
      #6  0x000055880fba001a in __wt_btcur_next (cbt=0x7f7a880d0390, truncating=true)
          at src/btree/bt_curnext.c:689
      #7  0x000055880fabb3c5 in __cursor_truncate (session=0x7f7b48963540, start=0x7f7a880d0390,
          stop=0x7f7a895d6410, rmfunc=0x55880fab7880 <__cursor_row_modify>) at src/btree/bt_cursor.c:1676
      #8  0x000055880fabb73d in __wt_btcur_range_truncate (start=0x7f7a880d0390, stop=0x7f7a895d6410)
          at src/btree/bt_cursor.c:1795
      #9  0x000055880fb956dd in __wt_schema_range_truncate (session=0x7f7b48963540,
          start=0x7f7a880d0390, stop=0x7f7a895d6410) at src/schema/schema_truncate.c:141
      #10 0x000055880fa59a42 in __wt_session_range_truncate (session=0x7f7b48963540, uri=0x0,
          start=0x7f7a880d0390, stop=0x7f7a895d6410) at src/session/session_api.c:1417
      #11 0x000055880fa5a415 in __session_truncate (wt_session=0x7f7b48963540, uri=0x0,
          start=0x7f7a880d0390, stop=0x7f7a895d6410, config=0x0) at src/session/session_api.c:1488
      #12 0x000055880f9b6428 in row_truncate (tinfo=0x55881ab231a0, cursor=0x7f7a880d0390) at ops.c:1443
      #13 0x000055880f9b4376 in ops (arg=0x55881ab231a0) at ops.c:879
      #14 0x00007f7b481556db in start_thread (arg=0x7f7b2a7f4700) at pthread_create.c:463
      #15 0x00007f7b4782688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 13 (Thread 0x7f7b2effd700 (LWP 56730)):
      #0  0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0,
          timeout=0x7f7b2effca60) at ../sysdeps/unix/sysv/linux/select.c:41
      #1  0x000055880fa2720e in __wt_sleep (seconds=100, micro_seconds=0) at src/os_posix/os_sleep.c:30
      #2  0x000055880fa1de21 in __wt_abort (session=0x7f7b48968040) at src/os_common/os_abort.c:26
      #3  0x000055880fabdeab in __wt_delete_page (session=0x7f7b48968040, ref=0x558817229730,
          skipp=0x7f7b2effcb51) at src/btree/bt_delete.c:113
      #4  0x000055880fafe208 in __tree_walk_internal (session=0x7f7b48968040, refp=0x7f7ab04086b8,
          walkcntp=0x0, skip_func=0x0, func_cookie=0x0, flags=6274) at src/btree/bt_walk.c:486
      #5  0x000055880fafe568 in __wt_tree_walk (session=0x7f7b48968040, refp=0x7f7ab04086b8, flags=6272)
          at src/btree/bt_walk.c:572
      #6  0x000055880fba001a in __wt_btcur_next (cbt=0x7f7ab0408540, truncating=true)
          at src/btree/bt_curnext.c:689
      #7  0x000055880fabb3c5 in __cursor_truncate (session=0x7f7b48968040, start=0x7f7ab0408540,
          stop=0x7f7ab015ba50, rmfunc=0x55880fab7880 <__cursor_row_modify>) at src/btree/bt_cursor.c:1676
      #8  0x000055880fabb73d in __wt_btcur_range_truncate (start=0x7f7ab0408540, stop=0x7f7ab015ba50)
          at src/btree/bt_cursor.c:1795
      #9  0x000055880fb956dd in __wt_schema_range_truncate (session=0x7f7b48968040,
          start=0x7f7ab0408540, stop=0x7f7ab015ba50) at src/schema/schema_truncate.c:141
      #10 0x000055880fa59a42 in __wt_session_range_truncate (session=0x7f7b48968040, uri=0x0,
          start=0x7f7ab0408540, stop=0x7f7ab015ba50) at src/session/session_api.c:1417
      #11 0x000055880fa5a415 in __session_truncate (wt_session=0x7f7b48968040, uri=0x0,
          start=0x7f7ab0408540, stop=0x7f7ab015ba50, config=0x0) at src/session/session_api.c:1488
      #12 0x000055880f9b6428 in row_truncate (tinfo=0x558813d42fd0, cursor=0x7f7ab0408540) at ops.c:1443
      #13 0x000055880f9b4376 in ops (arg=0x558813d42fd0) at ops.c:879
      #14 0x00007f7b481556db in start_thread (arg=0x7f7b2effd700) at pthread_create.c:463
      #15 0x00007f7b4782688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      I think the two stacks occurred independently, I don't see any relationship between them, they're different WT_REFs, truncating ranges that are disjoint. They are both, however, failing in the aftermath of txnID=18446744073709551615, that is, they're seeing problems with objects deleted by the same transaction.

      (gdb) thread 13
      [Switching to thread 13 (Thread 0x7f7b2effd700 (LWP 56730))]
      #0  0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, 
          timeout=0x7f7b2effca60) at ../sysdeps/unix/sysv/linux/select.c:41
      41	in ../sysdeps/unix/sysv/linux/select.c
      (gdb) frame 3
      #3  0x000055880fabdeab in __wt_delete_page (session=0x7f7b48968040, ref=0x558817229730, 
          skipp=0x7f7b2effcb51) at src/btree/bt_delete.c:113
      113	        WT_ASSERT(session, ref->page_del->txnid == WT_TXN_ABORTED);
      (gdb) p *ref
      $14 = {
        page = 0x0, 
        home = 0x558812c2b390, 
        pindex_hint = 4, 
        state = 3, 
        addr = 0x5588172298a5, 
        key = {
          recno = 64424509741, 
          ikey = 0xf0000012d
        }, 
        page_del = 0x7f7af80acb90, 
        page_las = 0x0, 
        hist = {{
            session = 0x7f7b48968040, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", 
            time_sec = 1567959072, 
            line = 101, 
            state = 3
          }, {
            session = 0x7f7b48961dd0, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4390 <__func__.36703> "__wt_delete_page_rollback", 
            time_sec = 1567959072, 
            line = 189, 
            state = 0
          }, {
            session = 0x7f7b48961dd0, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", 
            time_sec = 1567959072, 
            line = 155, 
            state = 1
          }}, 
        histoff = 2
      }
      (gdb) p *ref.page_del
      $15 = {
        txnid = 18446744073709551615, 
        timestamp = 0, 
        durable_timestamp = 0, 
        prepare_state = 0 '\000', 
        previous_state = 0, 
        update_list = 0x0
      }
       
      (gdb) thread 22
      [Switching to thread 22 (Thread 0x7f7b2a7f4700 (LWP 56739))]
      #0  0x00007f7b4781c03f in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, 
          timeout=0x7f7b2a7f3a60) at ../sysdeps/unix/sysv/linux/select.c:41
      41	in ../sysdeps/unix/sysv/linux/select.c
      (gdb) frame 3
      #3  0x000055880fabdeab in __wt_delete_page (session=0x7f7b48963540, ref=0x558816989ac0, 
          skipp=0x7f7b2a7f3b51) at src/btree/bt_delete.c:113
      113	        WT_ASSERT(session, ref->page_del->txnid == WT_TXN_ABORTED);
      (gdb) p *ref
      $16 = {
        page = 0x0, 
        home = 0x558818098770, 
        pindex_hint = 18, 
        state = 3, 
        addr = 0x55881800866d, 
        key = {
          recno = 81604379829, 
          ikey = 0x13000004b5
        }, 
        page_del = 0x7f7acc0e96c0, 
        page_las = 0x0, 
        hist = {{
            session = 0x7f7b48963540, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", 
            time_sec = 1567959072, 
            line = 101, 
            state = 3
          }, {
            session = 0x7f7b48965f70, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4390 <__func__.36703> "__wt_delete_page_rollback", 
            time_sec = 1567959072, 
            line = 189, 
            state = 0
          }, {
            session = 0x7f7b48965f70, 
            name = 0x55880fbc50dc "WT_SESSION.truncate", 
            func = 0x55880fbd4290 <__func__.36683> "__wt_delete_page", 
            time_sec = 1567959072, 
            line = 155, 
            state = 1
          }}, 
        histoff = 2
      }
      (gdb) p *ref.page_del
      $17 = {
        txnid = 18446744073709551615, 
        timestamp = 0, 
        durable_timestamp = 0, 
        prepare_state = 0 '\000', 
        previous_state = 0, 
        update_list = 0x0
      }
      

      Here's the CONFIG (copied from WT-5078).

      ############################################
      #  RUN PARAMETERS
      ############################################
      abort=0
      alter=0
      assert_commit_timestamp=1
      assert_read_timestamp=0
      auto_throttle=1
      backups=1
      bitcnt=7
      bloom=1
      bloom_bit_count=5
      bloom_hash_count=28
      bloom_oldest=0
      cache=620
      cache_minimum=20
      checkpoints=wiredtiger
      checkpoint_log_size=71
      checkpoint_wait=59
      checksum=on
      chunk_size=10
      compaction=0
      compression=none
      data_extend=0
      data_source=table
      delete_pct=33
      dictionary=0
      direct_io=0
      encryption=none
      evict_max=2
      file_type=row-store
      firstfit=0
      huffman_key=0
      huffman_value=0
      independent_thread_rng=0
      in_memory=0
      insert_pct=25
      internal_key_truncation=1
      internal_page_max=12
      isolation=snapshot
      key_gap=5
      key_max=28
      key_min=14
      leaf_page_max=10
      leak_memory=0
      logging=0
      logging_archive=0
      logging_compression=none
      logging_file_max=436785
      logging_prealloc=1
      long_running_txn=0
      lsm_worker_threads=3
      memory_page_max=10
      merge_max=15
      mmap=1
      modify_pct=22
      ops=0
      prefix_compression=0
      prefix_compression_min=1
      prepare=0
      quiet=1
      read_pct=16
      rebalance=1
      repeat_data_pct=27
      reverse=1
      rows=1000000
      runs=1
      salvage=1
      split_pct=98
      statistics=0
      statistics_server=0
      threads=31
      timer=4
      timing_stress_aggressive_sweep=0
      timing_stress_checkpoint=0
      timing_stress_lookaside_sweep=0
      timing_stress_split_1=0
      timing_stress_split_2=0
      timing_stress_split_3=0
      timing_stress_split_4=0
      timing_stress_split_5=0
      timing_stress_split_6=0
      timing_stress_split_7=0
      timing_stress_split_8=0
      transaction_timestamps=1
      transaction-frequency=100
      truncate=1
      value_max=883
      value_min=2
      verify=1
      wiredtiger_config=
      write_pct=4
      ############################################
      

        Attachments

          Activity

            People

            • Assignee:
              backlog-server-storage-engines Backlog - Storage Engines Team
              Reporter:
              keith.bostic Keith Bostic
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: