Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3157

checkpoint/transaction integrity issue when writes fail.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: WT2.9.1
    • Fix Version/s: WT2.9.2, 3.2.13, 3.4.3, 3.5.4
    • Labels:
      None
    • Sprint:
      Storage 2017-02-13

      Description

      Branch wt-2909-verify-checkpoint-integrity introduces a test that runs a subprogram that does some inserts and periodically checkpoints. During the course of a checkpoint, we cause some file system writes to fail, and we expect the subprogram to fail. The parent program opens a connection to the (failed) home directory and reads what it can.

      The subprogram inserts into two tables within a single transaction. In the case of the failure, we see one of the tables containing many records, and the other only containing 1. (we always do a checkpoint after the 1st record). The test always expects to see the same number of records in each. Note that there is a long comment in test/csuite/wt2909_checkpoint_integrity/main.c describing the test.

      There is a caveat to this JIRA report. We must be sure that there is not an error in the fail_fs code that violates some assumption of the file system code. In particular, fail_fs does not do locks or unlocks of files, or does syncs. That is because fail_fs does not need to be durable in the face of system crashes, only for process crashes. Perhaps I missed some other assumption.

      To see the failure:

      cd build_posix/test/csuite;
      ./test_wt2909_checkpoint_integrity -v -o 125
      

      That runs the "top level" test as well as the subtest. To run the subtest only, which populates and uses the fail_fs to inject failures, do:

      ./test_wt2909_checkpoint_integrity subtest -v -p -o 125 -n 50000
      

      At the moment, I've only verified this is a failure on OS/X. It's consistently reproducible.

      For a stack trace of where the write fault was injected, see WT_TEST.subtest/stdout.txt.

        Attachments

          Activity

            People

            • Assignee:
              sue.loverso Sue LoVerso
              Reporter:
              donald.anderson Donald Anderson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: