Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4324

Ensure checkpoints rewrite pages with data in the future

    XMLWordPrintable

    Details

    • Story Points:
      13
    • Sprint:
      Storage Engines 2018-10-08, Storage Engines 2018-10-22, Storage Engines 2018-11-05, Storage Engines 2018-11-19, Storage Engines 2018-12-03, Storage Engines 2018-12-17, Storage Engines 2018-12-31, Storage Engines 2019-01-28, Storage Engines 2019-02-11, Storage Engines 2019-02-25, Storage Engines 2019-03-11, Storage Engines 2019-03-25
    • Backport Requested:
      v4.0, v3.6

      Description

      Issue Status as of April 4, 2019

      ISSUE SUMMARY
      Sometimes eviction will choose versions of values to write to data files that are in the future of what a checkpoint can choose (this is termed skew newest eviction). If that happens, it's necessary for the checkpoint to revisit all of those pages and write the expected versions.

      The issue was introduced in WT-4094.

      USER IMPACT
      This problem means that checkpoints could contain inconsistent content. It is only possible if cache overflow (lookaside) is in use. Any checkpoint created while the lookaside file is being used could suffer from this issue, which can result in data loss in the following conditions:

      1. If data files were copied from a live system, or
      2. Restart after a shutdown.

      WORKAROUNDS
      There is currently no workaround for this issue.

      AFFECTED VERSIONS
      MongoDB 3.6.6+, 4.0.0+

      FIX VERSION
      MongoDB 3.6.12, MongoDB 4.0.9

      RESOLUTION DETAILS
      There were cases where the transaction ID associated with such a reconciliation was being set in a way that allowed a checkpoint to skip those pages. The result of which was that a checkpoint could be created with invalid content.

      Failure conditions

      The following events need to happen for this failure to occur:

      1. Checkpoint starts.
      2. A transaction T transitions the page from clean to dirty.
      3. A page is evicted to lookaside in "skew newest" mode.
      4. The checkpoint does not rewrite the page.

      At this point, the checkpoint on disk is inconsistent because it contains part of transaction T but not all of it.

      The next checkpoint or a clean shutdown would normally rewrite this page. However due to an optimization, if the stable timestamp has not changed, these checkpoints may be skipped.

      Test failure description

      A data mismatch (between a COL table and a ROW table) was detected on wiredtiger-test-checkpoint job on 'kodkod'. A similar failure was reported in WT-4244 (which was closed as a duplicate of WT-4239). 

      http://build.wiredtiger.com:8080/job/wiredtiger-test-checkpoint/3785/

      + nice ./test/checkpoint/t -t m -n 1000000 -k 5000000 -C cache_size=100MB
      t: process 11308
          1: 1 workers, 3 tables
      checkpointer thread starting: tid: 11308:0x7f3af6786700
      worker thread starting: tid: 11308:0x7f3aef7fe700
      Finished a checkpoint
      Finished verifying a checkpoint with 3 tables and 0 keys
      Finished a checkpoint
      Finished verifying a checkpoint with 3 tables and 87 keys
      Finished a checkpoint
      ...
      Finished verifying a checkpoint with 3 tables and 443504 keys
      t: 1st cursor didn't find 2nd key: WT_NOTFOUND: item not found
      t: verify_checkpoint - mismatching data: Bad address
      Finished a checkpoint
      ...
      Finished verifying a checkpoint with 3 tables and 470602 keys
      Finished a checkpoint
      Key/value mismatch: 657681/0000000000000000000000000000000375492 from a COL table is not 657675/0000000000000000000000000000000497345 from a ROW table
      Ran workers for: 198.422610 seconds
      Closing connection
      + cleanup
      + status=14 

       

        Attachments

        1. debug4324.diff
          11 kB
        2. debug4324-2.diff
          13 kB
        3. debug4324-3.diff
          17 kB
        4. debug4324-4.diff
          21 kB
        5. debug4324-5.diff
          30 kB
        6. debug4324-6.diff
          29 kB
        7. output-340903.txt
          6 kB
        8. output-5.txt
          16 kB
        9. output-6.txt
          8 kB

          Issue Links

            Activity

              People

              Assignee:
              brian.lane Brian Lane
              Reporter:
              luke.chen Luke Chen
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: