-
Type: Bug
-
Resolution: Gone away
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
-
5
-
Storage - Ra 2020-08-10, Storage - Ra 2020-08-24
When stress testing WT-6479, there is a failure;
[2020/07/01 09:39:36.677] [1593596365:28538][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __wt_time_value_validate, 257: value time window has a start transaction after its stop transaction; time window start: (0, 15663)/(0, 15663)/11276 stop: (0, 15698)/(0, 15698)/11274: Invalid argument [2020/07/01 09:39:36.677] [1593596365:29608][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __cell_check_value_validity, 21: value timestamp window failed validation: Invalid argument [2020/07/01 09:39:36.677] [1593596365:29634][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __cell_check_value_validity, 21: the process must exit and restart: WT_PANIC: WiredTiger library panic [2020/07/01 09:39:36.677] [1593596365:29647][6152:0x7fb1685eb700], file:wt.wt, WT_SESSION.compact: __wt_abort, 28: aborting WiredTiger library [2020/07/01 09:39:36.678] format.sh: /data/mci/de87916c27251c2ca2e3c34bf7da6c2b/wiredtiger/test/format/RUNDIR.139/CONFIG: [2020/07/01 09:39:36.693] ############################################ [2020/07/01 09:39:36.693] # RUN PARAMETERS: V2 [2020/07/01 09:39:36.693] ############################################ [2020/07/01 09:39:36.693] assert.commit_timestamp=0 [2020/07/01 09:39:36.693] assert.read_timestamp=0 [2020/07/01 09:39:36.693] backup=0 [2020/07/01 09:39:36.693] backup.incremental=off [2020/07/01 09:39:36.693] backup.incr_granularity=12840 [2020/07/01 09:39:36.693] btree.bitcnt=1 [2020/07/01 09:39:36.693] btree.compression=snappy [2020/07/01 09:39:36.693] btree.dictionary=0 [2020/07/01 09:39:36.693] btree.huffman_key=0 [2020/07/01 09:39:36.693] btree.huffman_value=0 [2020/07/01 09:39:36.693] btree.internal_key_truncation=0 [2020/07/01 09:39:36.693] btree.internal_page_max=15 [2020/07/01 09:39:36.693] btree.key_gap=19 [2020/07/01 09:39:36.693] btree.key_max=124 [2020/07/01 09:39:36.693] btree.key_min=31 [2020/07/01 09:39:36.693] btree.leaf_page_max=15 [2020/07/01 09:39:36.693] btree.memory_page_max=8 [2020/07/01 09:39:36.693] btree.prefix_compression=1 [2020/07/01 09:39:36.693] btree.prefix_compression_min=8 [2020/07/01 09:39:36.693] btree.repeat_data_pct=1 [2020/07/01 09:39:36.693] btree.reverse=0 [2020/07/01 09:39:36.693] btree.split_pct=74 [2020/07/01 09:39:36.693] btree.value_max=3077 [2020/07/01 09:39:36.693] btree.value_min=3 [2020/07/01 09:39:36.693] cache=80 [2020/07/01 09:39:36.693] cache.evict_max=0 [2020/07/01 09:39:36.693] cache.minimum=20 [2020/07/01 09:39:36.693] checkpoint=on [2020/07/01 09:39:36.693] checkpoint.log_size=196 [2020/07/01 09:39:36.693] checkpoint.wait=28 [2020/07/01 09:39:36.693] disk.checksum=uncompressed [2020/07/01 09:39:36.693] disk.data_extend=1 [2020/07/01 09:39:36.693] disk.direct_io=0 [2020/07/01 09:39:36.693] disk.encryption=none [2020/07/01 09:39:36.693] disk.firstfit=0 [2020/07/01 09:39:36.693] disk.mmap=1 [2020/07/01 09:39:36.693] disk.mmap_all=0 [2020/07/01 09:39:36.693] format.abort=0 [2020/07/01 09:39:36.693] format.independent_thread_rng=0 [2020/07/01 09:39:36.693] format.major_timeout=0 [2020/07/01 09:39:36.693] logging=0 [2020/07/01 09:39:36.693] logging.archive=0 [2020/07/01 09:39:36.693] logging.compression=none [2020/07/01 09:39:36.693] logging.file_max=299776 [2020/07/01 09:39:36.693] logging.prealloc=0 [2020/07/01 09:39:36.693] lsm.auto_throttle=1 [2020/07/01 09:39:36.693] lsm.bloom=1 [2020/07/01 09:39:36.693] lsm.bloom_bit_count=55 [2020/07/01 09:39:36.693] lsm.bloom_hash_count=18 [2020/07/01 09:39:36.693] lsm.bloom_oldest=0 [2020/07/01 09:39:36.693] lsm.chunk_size=2 [2020/07/01 09:39:36.693] lsm.merge_max=14 [2020/07/01 09:39:36.693] lsm.worker_threads=3 [2020/07/01 09:39:36.693] ops.alter=0 [2020/07/01 09:39:36.693] ops.compaction=1 [2020/07/01 09:39:36.693] ops.hs_cursor=1 [2020/07/01 09:39:36.693] ops.pct.delete=25 [2020/07/01 09:39:36.693] ops.pct.insert=3 [2020/07/01 09:39:36.693] ops.pct.modify=5 [2020/07/01 09:39:36.693] ops.pct.read=0 [2020/07/01 09:39:36.693] ops.pct.write=67 [2020/07/01 09:39:36.693] ops.prepare=0 [2020/07/01 09:39:36.693] ops.random_cursor=0 [2020/07/01 09:39:36.693] ops.rebalance=0 [2020/07/01 09:39:36.693] ops.salvage=1 [2020/07/01 09:39:36.693] ops.truncate=1 [2020/07/01 09:39:36.693] ops.verify=1 [2020/07/01 09:39:36.693] quiet=1 [2020/07/01 09:39:36.693] runs=1 [2020/07/01 09:39:36.693] runs.in_memory=0 [2020/07/01 09:39:36.693] runs.ops=0 [2020/07/01 09:39:36.693] runs.rows=1000000 [2020/07/01 09:39:36.693] runs.source=table [2020/07/01 09:39:36.693] runs.threads=4 [2020/07/01 09:39:36.693] runs.timer=6 [2020/07/01 09:39:36.693] runs.type=row-store [2020/07/01 09:39:36.693] runs.verify_failure_dump=0 [2020/07/01 09:39:36.693] statistics=0 [2020/07/01 09:39:36.693] statistics.server=0 [2020/07/01 09:39:36.693] stress.aggressive_sweep=0 [2020/07/01 09:39:36.693] stress.checkpoint=0 [2020/07/01 09:39:36.693] stress.hs_checkpoint_delay=0 [2020/07/01 09:39:36.693] stress.hs_sweep=0 [2020/07/01 09:39:36.693] stress.split_1=0 [2020/07/01 09:39:36.693] stress.split_2=0 [2020/07/01 09:39:36.693] stress.split_3=0 [2020/07/01 09:39:36.693] stress.split_4=0 [2020/07/01 09:39:36.693] stress.split_5=0 [2020/07/01 09:39:36.693] stress.split_6=0 [2020/07/01 09:39:36.693] stress.split_7=0 [2020/07/01 09:39:36.693] stress.split_8=1 [2020/07/01 09:39:36.693] transaction.frequency=100 [2020/07/01 09:39:36.693] transaction.isolation=snapshot [2020/07/01 09:39:36.694] transaction.rollback_to_stable=0 [2020/07/01 09:39:36.694] transaction.timestamps=1 [2020/07/01 09:39:36.694] wiredtiger.config= [2020/07/01 09:39:36.694] wiredtiger.rwlock=1 [2020/07/01 09:39:36.694] wiredtiger.leak_memory=0 [2020/07/01 09:39:36.694] ############################################
I don't think this issue is caused by WT-6479. I looked into the core dump and the transaction ids are out of order:
(gdb) bt #0 0x00007fb1728f5e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fb1728f7801 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000000000075fc34 in __wt_abort (session=0x7fb173efeee0) at ../src/os_common/os_abort.c:30 #3 0x00000000008867f0 in __wt_panic_func (session=0x7fb173efeee0, error=22, func=0xe33600 <__func__.__cell_check_value_validity> "__cell_check_value_validity", line=21, fmt=0xe33640 <str> "value timestamp window failed validation") at ../src/support/err.c:378 #4 0x0000000000d1fd23 in __cell_check_value_validity (session=0x7fb173efeee0, tw=0x7fb1685e7780, expected_error=false) at ../src/include/cell.i:20 #5 0x0000000000d1f249 in __cell_pack_value_validity (session=0x7fb173efeee0, pp=0x7fb1685e72e0, tw=0x7fb1685e7780) at ../src/include/cell.i:45 #6 0x0000000000d1ed4f in __wt_cell_pack_value (session=0x7fb173efeee0, cell=0x61d0002f70c0, tw=0x7fb1685e7780, rle=0, size=16) at ../src/include/cell.i:233 #7 0x0000000000d0ce0f in __wt_rec_cell_build_val (session=0x7fb173efeee0, r=0x61d0002f6c80, data=0x606000290ce7, size=16, tw=0x7fb1685e7780, rle=0) at ../src/include/reconcile.i:386 #8 0x0000000000d16653 in __wt_rec_row_leaf (session=0x7fb173efeee0, r=0x61d0002f6c80, pageref=0x60e0000a7940, salvage=0x0) at ../src/reconcile/rec_row.c:838 #9 0x00000000007be527 in __reconcile (session=0x7fb173efeee0, ref=0x60e0000a7940, salvage=0x0, flags=9, page_lockedp=0x7fb1685e8ae0) at ../src/reconcile/rec_write.c:182 #10 0x00000000007bd08f in __wt_reconcile (session=0x7fb173efeee0, ref=0x60e0000a7940, salvage=0x0, flags=9) at ../src/reconcile/rec_write.c:89 #11 0x0000000000b33a53 in __wt_sync_file (session=0x7fb173efeee0, syncop=WT_SYNC_CHECKPOINT) at ../src/btree/bt_sync.c:594 #12 0x0000000000917a2e in __checkpoint_tree (session=0x7fb173efeee0, is_checkpoint=true, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1619 #13 0x0000000000921cbb in __checkpoint_tree_helper (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1727 #14 0x0000000000921a4d in __checkpoint_apply_to_dhandles (session=0x7fb173efeee0, cfg=0x7fb1685ea220, op=0x921b20 <__checkpoint_tree_helper>) at ../src/txn/txn_ckpt.c:199 #15 0x000000000091a726 in __txn_checkpoint (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:872 #16 0x0000000000915b76 in __txn_checkpoint_wrapper (session=0x7fb173efeee0, cfg=0x7fb1685ea220) at ../src/txn/txn_ckpt.c:1086 #17 0x00000000009156ab in __wt_txn_checkpoint (session=0x7fb173efeee0, cfg=0x7fb1685ea220, waiting=false) at ../src/txn/txn_ckpt.c:1145 #18 0x0000000000879faa in __compact_checkpoint (session=0x7fb173efeee0) at ../src/session/session_compact.c:214 #19 0x0000000000877ef3 in __compact_worker (session=0x7fb173efeee0) at ../src/session/session_compact.c:313 #20 0x0000000000874daa in __wt_session_compact (wt_session=0x7fb173efeee0, uri=0x611000000040 "table:wt", config=0x0) at ../src/session/session_compact.c:398 #21 0x00000000004fcd12 in compact (arg=0x0) at ../../../test/format/compact.c:69 #22 0x00007fb1738a66db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #23 0x00007fb1729d888f in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) p *$6 $7 = {txnid = 11274, durable_ts = 15698, start_ts = 15698, next = 0x606000290cc0, size = 0, type = 4 '\004', prepare_state = 0 '\000', flags = 0 '\000', data = 0x6040000e3c77 ""} (gdb) p *$6->next $8 = {txnid = 11276, durable_ts = 15663, start_ts = 15663, next = 0x0, size = 16, type = 3 '\003', prepare_state = 0 '\000', flags = 8 '\b', data = 0x606000290ce7 "0000121634/LMNOP"} (gdb) p *vpack $10 = {cell = 0x63200541c434, v = 0, data = 0x63200541c43e, size = 16, __len = 26, prefix = 0 '\000', raw = 128 '\200', type = 128 '\200', flags = 0 '\000', tw = {durable_start_ts = 15663, start_ts = 15663, start_txn = 11276, durable_stop_ts = 0, stop_ts = 18446744073709551615, stop_txn = 18446744073709551605, prepare = 0 '\000'}}
- depends on
-
WT-6546 Update fast truncate to use newest start durable ts
- Closed
-
WT-6548 Investigate fast truncate solution with new aggregate time information only in in-memory structures.
- Closed
- has to be done after
-
WT-6479 Don't insert updates after the first globally visible update to the history store
- Closed
- is depended on by
-
WT-6520 Evict non globally visible truncate
- Closed
- is related to
-
WT-6325 Fast truncate can lead to never resolved prepared operations
- Closed
-
WT-6576 Fix the aborted on-disk prepared key
- Closed
- related to
-
WT-6545 Ensure Truncate operation does not fast truncate pages with active prepared updates.
- Backlog
-
WT-6547 Investigate repurposing of oldest_start_ts as newest_start_ts and oldest_start_txn as newest_start_txn in aggregated timewindow.
- Closed