Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3157

checkpoint/transaction integrity issue when writes fail.

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.2, 3.2.13, 3.4.3, 3.5.4
    • Affects Version/s: WT2.9.1
    • Component/s: None
    • Labels:
      None
    • Storage 2017-02-13

      Branch wt-2909-verify-checkpoint-integrity introduces a test that runs a subprogram that does some inserts and periodically checkpoints. During the course of a checkpoint, we cause some file system writes to fail, and we expect the subprogram to fail. The parent program opens a connection to the (failed) home directory and reads what it can.

      The subprogram inserts into two tables within a single transaction. In the case of the failure, we see one of the tables containing many records, and the other only containing 1. (we always do a checkpoint after the 1st record). The test always expects to see the same number of records in each. Note that there is a long comment in test/csuite/wt2909_checkpoint_integrity/main.c describing the test.

      There is a caveat to this JIRA report. We must be sure that there is not an error in the fail_fs code that violates some assumption of the file system code. In particular, fail_fs does not do locks or unlocks of files, or does syncs. That is because fail_fs does not need to be durable in the face of system crashes, only for process crashes. Perhaps I missed some other assumption.

      To see the failure:

      cd build_posix/test/csuite;
      ./test_wt2909_checkpoint_integrity -v -o 125
      

      That runs the "top level" test as well as the subtest. To run the subtest only, which populates and uses the fail_fs to inject failures, do:

      ./test_wt2909_checkpoint_integrity subtest -v -p -o 125 -n 50000
      

      At the moment, I've only verified this is a failure on OS/X. It's consistently reproducible.

      For a stack trace of where the write fault was injected, see WT_TEST.subtest/stdout.txt.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: