-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
There's an in-memory stress test failure I've seen lately:
t, eviction-server: cache eviction server error: WT_RESTART: restart the operation (internal)
I thought for awhile it was the same as WT-2576 (it's another heavily-threaded, in-memory variable-length column-store failure), but I've seen it after that fix was merged.
It may be related: in short, when rewriting pages in-memory if the page is corrupted, we can fail to insert saved-update records because our "is there a race tests" in serial.i fail, returning WT_RESTART. Of course, there should never be a race when rewriting pages in-memory because we have exclusive access to the page.
Here's the CONFIG from a recent PPC stress test failure:
############################################ # RUN PARAMETERS ############################################ abort=0 auto_throttle=1 backups=0 bitcnt=8 bloom=1 bloom_bit_count=54 bloom_hash_count=20 bloom_oldest=0 cache=34 checkpoints=0 checksum=uncompressed chunk_size=2 compaction=0 compression=none data_extend=0 data_source=table delete_pct=16 dictionary=0 direct_io=0 encryption=none evict_max=1 file_type=variable-length column-store firstfit=0 huffman_key=0 huffman_value=0 in_memory=1 insert_pct=15 internal_key_truncation=1 internal_page_max=12 isolation=random key_gap=8 key_max=32 key_min=10 leaf_page_max=17 leak_memory=0 logging=0 logging_archive=1 logging_compression=lz4 logging_prealloc=1 long_running_txn=0 lsm_worker_threads=3 merge_max=17 mmap=1 ops=100000 prefix_compression=0 prefix_compression_min=2 quiet=1 repeat_data_pct=50 reverse=0 rows=100000 runs=100 rebalance=0 salvage=0 split_pct=77 statistics=0 statistics_server=0 threads=24 timer=20 transaction-frequency=99 value_max=80 value_min=17 verify=0 wiredtiger_config= write_pct=67 ############################################