Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8808

Data validation failure in test_timestamp_abort

      This ticket is a follow-on for WT-8392. The signature of the failure is:

      [2021/11/11 05:28:34.215] CONFIG: test_timestamp_abort -s -h WT_TEST.timestamp-abort -T 5 -t 10
      [2021/11/11 05:28:34.215] Kill child
      [2021/11/11 05:28:34.215] Open database, run recovery and verify content
      [2021/11/11 05:28:34.215] Got stable_val 228976
      [2021/11/11 05:28:34.215] records-1: LOCAL no record with key 1000024190
      [2021/11/11 05:28:34.215] LOCAL: 1 record(s) absent from 117316 
      

      There have been two failures on ubuntu2004-small hosts, both in test_timestamp_abort -s (i.e. the stress variant) where the local record is absent from the local table after crash and recovery. This has never failed in reproduction attempts and only failed twice in several months.

      The suspicion is that there is a file system bug. Both failures indicate that the local update completed, which means it wrote its insert into the WT log and that record would have been written to the OS buffer cache before returning. Then the application writes the record into its text file.

      In WT-8392, debugging was added and turned on for stress runs to record pwrite operations and print the thread and key written. The output will show if pwrite succeeded and the offset/length of the record in the log file.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: