Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4324

Ensure checkpoints rewrite pages with data in the future

    • 13
    • Storage Engines 2018-10-08, Storage Engines 2018-10-22, Storage Engines 2018-11-05, Storage Engines 2018-11-19, Storage Engines 2018-12-03, Storage Engines 2018-12-17, Storage Engines 2018-12-31, Storage Engines 2019-01-28, Storage Engines 2019-02-11, Storage Engines 2019-02-25, Storage Engines 2019-03-11, Storage Engines 2019-03-25
    • v4.0, v3.6

      Issue Status as of April 4, 2019

      ISSUE SUMMARY
      Sometimes eviction will choose versions of values to write to data files that are in the future of what a checkpoint can choose (this is termed skew newest eviction). If that happens, it's necessary for the checkpoint to revisit all of those pages and write the expected versions.

      The issue was introduced in WT-4094.

      USER IMPACT
      This problem means that checkpoints could contain inconsistent content. It is only possible if cache overflow (lookaside) is in use. Any checkpoint created while the lookaside file is being used could suffer from this issue, which can result in data loss in the following conditions:

      1. If data files were copied from a live system, or
      2. Restart after a shutdown.

      WORKAROUNDS
      There is currently no workaround for this issue.

      AFFECTED VERSIONS
      MongoDB 3.6.6+, 4.0.0+

      FIX VERSION
      MongoDB 3.6.12, MongoDB 4.0.9

      RESOLUTION DETAILS
      There were cases where the transaction ID associated with such a reconciliation was being set in a way that allowed a checkpoint to skip those pages. The result of which was that a checkpoint could be created with invalid content.

      Failure conditions

      The following events need to happen for this failure to occur:

      1. Checkpoint starts.
      2. A transaction T transitions the page from clean to dirty.
      3. A page is evicted to lookaside in "skew newest" mode.
      4. The checkpoint does not rewrite the page.

      At this point, the checkpoint on disk is inconsistent because it contains part of transaction T but not all of it.

      The next checkpoint or a clean shutdown would normally rewrite this page. However due to an optimization, if the stable timestamp has not changed, these checkpoints may be skipped.

      Test failure description

      A data mismatch (between a COL table and a ROW table) was detected on wiredtiger-test-checkpoint job on 'kodkod'. A similar failure was reported in WT-4244 (which was closed as a duplicate of WT-4239). 

      http://build.wiredtiger.com:8080/job/wiredtiger-test-checkpoint/3785/

      + nice ./test/checkpoint/t -t m -n 1000000 -k 5000000 -C cache_size=100MB
      t: process 11308
          1: 1 workers, 3 tables
      checkpointer thread starting: tid: 11308:0x7f3af6786700
      worker thread starting: tid: 11308:0x7f3aef7fe700
      Finished a checkpoint
      Finished verifying a checkpoint with 3 tables and 0 keys
      Finished a checkpoint
      Finished verifying a checkpoint with 3 tables and 87 keys
      Finished a checkpoint
      ...
      Finished verifying a checkpoint with 3 tables and 443504 keys
      t: 1st cursor didn't find 2nd key: WT_NOTFOUND: item not found
      t: verify_checkpoint - mismatching data: Bad address
      Finished a checkpoint
      ...
      Finished verifying a checkpoint with 3 tables and 470602 keys
      Finished a checkpoint
      Key/value mismatch: 657681/0000000000000000000000000000000375492 from a COL table is not 657675/0000000000000000000000000000000497345 from a ROW table
      Ran workers for: 198.422610 seconds
      Closing connection
      + cleanup
      + status=14 

       

        1. debug4324.diff
          11 kB
        2. debug4324-2.diff
          13 kB
        3. debug4324-3.diff
          17 kB
        4. debug4324-4.diff
          21 kB
        5. output-340903.txt
          6 kB
        6. debug4324-5.diff
          30 kB
        7. output-5.txt
          16 kB
        8. debug4324-6.diff
          29 kB
        9. output-6.txt
          8 kB

            Assignee:
            brian.lane@mongodb.com Brian Lane
            Reporter:
            luke.chen@mongodb.com Luke Chen
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: