Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6494

__wt_time_value_validate failed because start transaction id is larger than stop transaction id

    • 5
    • Storage - Ra 2020-08-10, Storage - Ra 2020-08-24

      When stress testing WT-6479, there is a failure;

      [2020/07/01 09:39:36.677]     [1593596365:28538][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __wt_time_value_validate, 257: value time window has a start transaction after its stop transaction; time window start: (0, 15663)/(0, 15663)/11276 stop: (0, 15698)/(0, 15698)/11274: Invalid argument
       [2020/07/01 09:39:36.677]     [1593596365:29608][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __cell_check_value_validity, 21: value timestamp window failed validation: Invalid argument
       [2020/07/01 09:39:36.677]     [1593596365:29634][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __cell_check_value_validity, 21: the process must exit and restart: WT_PANIC: WiredTiger library panic
       [2020/07/01 09:39:36.677]     [1593596365:29647][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __wt_abort, 28: aborting WiredTiger library
       [2020/07/01 09:39:36.678] format.sh: /data/mci/de87916c27251c2ca2e3c34bf7da6c2b/wiredtiger/test/format/RUNDIR.139/CONFIG:
       [2020/07/01 09:39:36.693]     ############################################
       [2020/07/01 09:39:36.693]     #  RUN PARAMETERS: V2
       [2020/07/01 09:39:36.693]     ############################################
       [2020/07/01 09:39:36.693]     assert.commit_timestamp=0
       [2020/07/01 09:39:36.693]     assert.read_timestamp=0
       [2020/07/01 09:39:36.693]     backup=0
       [2020/07/01 09:39:36.693]     backup.incremental=off
       [2020/07/01 09:39:36.693]     backup.incr_granularity=12840
       [2020/07/01 09:39:36.693]     btree.bitcnt=1
       [2020/07/01 09:39:36.693]     btree.compression=snappy
       [2020/07/01 09:39:36.693]     btree.dictionary=0
       [2020/07/01 09:39:36.693]     btree.huffman_key=0
       [2020/07/01 09:39:36.693]     btree.huffman_value=0
       [2020/07/01 09:39:36.693]     btree.internal_key_truncation=0
       [2020/07/01 09:39:36.693]     btree.internal_page_max=15
       [2020/07/01 09:39:36.693]     btree.key_gap=19
       [2020/07/01 09:39:36.693]     btree.key_max=124
       [2020/07/01 09:39:36.693]     btree.key_min=31
       [2020/07/01 09:39:36.693]     btree.leaf_page_max=15
       [2020/07/01 09:39:36.693]     btree.memory_page_max=8
       [2020/07/01 09:39:36.693]     btree.prefix_compression=1
       [2020/07/01 09:39:36.693]     btree.prefix_compression_min=8
       [2020/07/01 09:39:36.693]     btree.repeat_data_pct=1
       [2020/07/01 09:39:36.693]     btree.reverse=0
       [2020/07/01 09:39:36.693]     btree.split_pct=74
       [2020/07/01 09:39:36.693]     btree.value_max=3077
       [2020/07/01 09:39:36.693]     btree.value_min=3
       [2020/07/01 09:39:36.693]     cache=80
       [2020/07/01 09:39:36.693]     cache.evict_max=0
       [2020/07/01 09:39:36.693]     cache.minimum=20
       [2020/07/01 09:39:36.693]     checkpoint=on
       [2020/07/01 09:39:36.693]     checkpoint.log_size=196
       [2020/07/01 09:39:36.693]     checkpoint.wait=28
       [2020/07/01 09:39:36.693]     disk.checksum=uncompressed
       [2020/07/01 09:39:36.693]     disk.data_extend=1
       [2020/07/01 09:39:36.693]     disk.direct_io=0
       [2020/07/01 09:39:36.693]     disk.encryption=none
       [2020/07/01 09:39:36.693]     disk.firstfit=0
       [2020/07/01 09:39:36.693]     disk.mmap=1
       [2020/07/01 09:39:36.693]     disk.mmap_all=0
       [2020/07/01 09:39:36.693]     format.abort=0
       [2020/07/01 09:39:36.693]     format.independent_thread_rng=0
       [2020/07/01 09:39:36.693]     format.major_timeout=0
       [2020/07/01 09:39:36.693]     logging=0
       [2020/07/01 09:39:36.693]     logging.archive=0
       [2020/07/01 09:39:36.693]     logging.compression=none
       [2020/07/01 09:39:36.693]     logging.file_max=299776
       [2020/07/01 09:39:36.693]     logging.prealloc=0
       [2020/07/01 09:39:36.693]     lsm.auto_throttle=1
       [2020/07/01 09:39:36.693]     lsm.bloom=1
       [2020/07/01 09:39:36.693]     lsm.bloom_bit_count=55
       [2020/07/01 09:39:36.693]     lsm.bloom_hash_count=18
       [2020/07/01 09:39:36.693]     lsm.bloom_oldest=0
       [2020/07/01 09:39:36.693]     lsm.chunk_size=2
       [2020/07/01 09:39:36.693]     lsm.merge_max=14
       [2020/07/01 09:39:36.693]     lsm.worker_threads=3
       [2020/07/01 09:39:36.693]     ops.alter=0
       [2020/07/01 09:39:36.693]     ops.compaction=1
       [2020/07/01 09:39:36.693]     ops.hs_cursor=1
       [2020/07/01 09:39:36.693]     ops.pct.delete=25
       [2020/07/01 09:39:36.693]     ops.pct.insert=3
       [2020/07/01 09:39:36.693]     ops.pct.modify=5
       [2020/07/01 09:39:36.693]     ops.pct.read=0
       [2020/07/01 09:39:36.693]     ops.pct.write=67
       [2020/07/01 09:39:36.693]     ops.prepare=0
       [2020/07/01 09:39:36.693]     ops.random_cursor=0
       [2020/07/01 09:39:36.693]     ops.rebalance=0
       [2020/07/01 09:39:36.693]     ops.salvage=1
       [2020/07/01 09:39:36.693]     ops.truncate=1
       [2020/07/01 09:39:36.693]     ops.verify=1
       [2020/07/01 09:39:36.693]     quiet=1
       [2020/07/01 09:39:36.693]     runs=1
       [2020/07/01 09:39:36.693]     runs.in_memory=0
       [2020/07/01 09:39:36.693]     runs.ops=0
       [2020/07/01 09:39:36.693]     runs.rows=1000000
       [2020/07/01 09:39:36.693]     runs.source=table
       [2020/07/01 09:39:36.693]     runs.threads=4
       [2020/07/01 09:39:36.693]     runs.timer=6
       [2020/07/01 09:39:36.693]     runs.type=row-store
       [2020/07/01 09:39:36.693]     runs.verify_failure_dump=0
       [2020/07/01 09:39:36.693]     statistics=0
       [2020/07/01 09:39:36.693]     statistics.server=0
       [2020/07/01 09:39:36.693]     stress.aggressive_sweep=0
       [2020/07/01 09:39:36.693]     stress.checkpoint=0
       [2020/07/01 09:39:36.693]     stress.hs_checkpoint_delay=0
       [2020/07/01 09:39:36.693]     stress.hs_sweep=0
       [2020/07/01 09:39:36.693]     stress.split_1=0
       [2020/07/01 09:39:36.693]     stress.split_2=0
       [2020/07/01 09:39:36.693]     stress.split_3=0
       [2020/07/01 09:39:36.693]     stress.split_4=0
       [2020/07/01 09:39:36.693]     stress.split_5=0
       [2020/07/01 09:39:36.693]     stress.split_6=0
       [2020/07/01 09:39:36.693]     stress.split_7=0
       [2020/07/01 09:39:36.693]     stress.split_8=1
       [2020/07/01 09:39:36.693]     transaction.frequency=100
       [2020/07/01 09:39:36.693]     transaction.isolation=snapshot
       [2020/07/01 09:39:36.694]     transaction.rollback_to_stable=0
       [2020/07/01 09:39:36.694]     transaction.timestamps=1
       [2020/07/01 09:39:36.694]     wiredtiger.config=
       [2020/07/01 09:39:36.694]     wiredtiger.rwlock=1
       [2020/07/01 09:39:36.694]     wiredtiger.leak_memory=0
       [2020/07/01 09:39:36.694]     ############################################
      

      I don't think this issue is caused by WT-6479. I looked into the core dump and the transaction ids are out of order:

      (gdb) bt
      #0  0x00007fb1728f5e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007fb1728f7801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x000000000075fc34 in __wt_abort (session=0x7fb173efeee0) at ../src/os_common/os_abort.c:30
      #3  0x00000000008867f0 in __wt_panic_func (session=0x7fb173efeee0, error=22, func=0xe33600 <__func__.__cell_check_value_validity> "__cell_check_value_validity",
          line=21, fmt=0xe33640 <str> "value timestamp window failed validation") at ../src/support/err.c:378
      #4  0x0000000000d1fd23 in __cell_check_value_validity (session=0x7fb173efeee0, tw=0x7fb1685e7780, expected_error=false) at ../src/include/cell.i:20
      #5  0x0000000000d1f249 in __cell_pack_value_validity (session=0x7fb173efeee0, pp=0x7fb1685e72e0, tw=0x7fb1685e7780) at ../src/include/cell.i:45
      #6  0x0000000000d1ed4f in __wt_cell_pack_value (session=0x7fb173efeee0, cell=0x61d0002f70c0, tw=0x7fb1685e7780, rle=0, size=16) at ../src/include/cell.i:233
      #7  0x0000000000d0ce0f in __wt_rec_cell_build_val (session=0x7fb173efeee0, r=0x61d0002f6c80, data=0x606000290ce7, size=16, tw=0x7fb1685e7780, rle=0)
          at ../src/include/reconcile.i:386
      #8  0x0000000000d16653 in __wt_rec_row_leaf (session=0x7fb173efeee0, r=0x61d0002f6c80, pageref=0x60e0000a7940, salvage=0x0) at ../src/reconcile/rec_row.c:838
      #9  0x00000000007be527 in __reconcile (session=0x7fb173efeee0, ref=0x60e0000a7940, salvage=0x0, flags=9, page_lockedp=0x7fb1685e8ae0)
          at ../src/reconcile/rec_write.c:182
      #10 0x00000000007bd08f in __wt_reconcile (session=0x7fb173efeee0, ref=0x60e0000a7940, salvage=0x0, flags=9) at ../src/reconcile/rec_write.c:89
      #11 0x0000000000b33a53 in __wt_sync_file (session=0x7fb173efeee0, syncop=WT_SYNC_CHECKPOINT) at ../src/btree/bt_sync.c:594
      #12 0x0000000000917a2e in __checkpoint_tree (session=0x7fb173efeee0, is_checkpoint=true, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1619
      #13 0x0000000000921cbb in __checkpoint_tree_helper (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1727
      #14 0x0000000000921a4d in __checkpoint_apply_to_dhandles (session=0x7fb173efeee0, cfg=0x7fb1685ea220, op=0x921b20 <__checkpoint_tree_helper>)
          at ../src/txn/txn_ckpt.c:199
      #15 0x000000000091a726 in __txn_checkpoint (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:872
      #16 0x0000000000915b76 in __txn_checkpoint_wrapper (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1086
      #17 0x00000000009156ab in __wt_txn_checkpoint (session=0x7fb173efeee0, cfg=0x7fb1685ea220, waiting=false) at ../src/txn/txn_ckpt.c:1145
      #18 0x0000000000879faa in __compact_checkpoint (session=0x7fb173efeee0) at ../src/session/session_compact.c:214
      #19 0x0000000000877ef3 in __compact_worker (session=0x7fb173efeee0) at ../src/session/session_compact.c:313
      #20 0x0000000000874daa in __wt_session_compact (wt_session=0x7fb173efeee0, uri=0x611000000040 "table:wt", config=0x0) at ../src/session/session_compact.c:398
      #21 0x00000000004fcd12 in compact (arg=0x0) at ../../../test/format/compact.c:69
      #22 0x00007fb1738a66db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
      #23 0x00007fb1729d888f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      
      (gdb) p *$6
      $7 = {txnid = 11274, durable_ts = 15698, start_ts = 15698, next = 0x606000290cc0, size = 0, type = 4 '\004', prepare_state = 0 '\000', flags = 0 '\000',
        data = 0x6040000e3c77 ""}
      (gdb) p *$6->next
      $8 = {txnid = 11276, durable_ts = 15663, start_ts = 15663, next = 0x0, size = 16, type = 3 '\003', prepare_state = 0 '\000', flags = 8 '\b',
        data = 0x606000290ce7 "0000121634/LMNOP"}
      (gdb) p *vpack
      $10 = {cell = 0x63200541c434, v = 0, data = 0x63200541c43e, size = 16, __len = 26, prefix = 0 '\000', raw = 128 '\200', type = 128 '\200', flags = 0 '\000',
        tw = {durable_start_ts = 15663, start_ts = 15663, start_txn = 11276, durable_stop_ts = 0, stop_ts = 18446744073709551615, stop_txn = 18446744073709551605,
          prepare = 0 '\000'}}
      

            Assignee:
            chenhao.qu@mongodb.com Chenhao Qu
            Reporter:
            chenhao.qu@mongodb.com Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: