Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2969

Possible snapshot corruption during compaction

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: WT2.8.0
    • Fix Version/s: 3.2.12, WT2.9.1, 3.5.1, 3.4.2
    • Labels:
      None
    • Environment:
      Reproduced on both Amazon Linux and MacOS.
    • # Replies:
      14
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-11-21, Storage 2016-12-12

      Description

      We believe we have run into a potential snapshot corruption issue in 2.8.0, when running compaction concurrently with writes. Minimal repro case is attached below, so would appreciate comments if there’s perhaps a problem with what we’re doing. What appears to happen is that default snapshot created during compaction has a table and its index out of sync with each other (records that exist in one are not in the other, and vice versa). Calling session.verify() does not report any problems, however explicitly comparing records in index to the table uncovers large number of mismatches.

      We have not yet been able to fully confirm that only snapshot is corrupted, or if the state of BTree is affected as well. But early indication is that only snapshot is impacted, because explicitly calling checkpoint() after compact() gets the table and index into consistent state again.

      Here’s the minimal repro case (using java APIs, but I don't think that's relevant here). The case goes like this:

      · Open connection and create a table with one index
      · Verify contents of the table against content of the index. (Note, there are no mutations at this time, so expect both to match perfectly).
      · Start one writer thread (90/10 ratio of adds to deletes).
      · Sleep 10 seconds
      · Run compaction on both table and index
      · Sleep 10 more seconds
      · Exit

      When run first time get expected “0 corrupted records out of 0”. When run the second time (note a snapshot now exists, created by compaction) get the following error “6598 corrupted records out of 1138935”

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Import wiredtiger: 1b6c815a3fd34f14c20d5cd627155799d1de535c from branch mongodb-3.6

          ref: ca6eee06ff..1b6c815a3f
          for: 3.5.1

          WT-2336 Add a test validating schema operations via file system call monitoring
          WT-2670 Add option to configure read-ahead per table and change default behavior
          WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage
          WT-2969 Fix a bug that could cause snapshot corruption during compaction
          WT-3014 Add GCC/clang support for ELF symbol visibility.
          WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode
          WT-3025 fix error path in log_force_sync
          WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check
          WT-3030 Test failure indicating invalid key order during traversal
          WT-3034 Add support for single-writer named snapshots.
          WT-3037 Fix some outdated comments in logging
          WT-3048 WiredTiger maximum size warning uses the wrong format.
          WT-3051 Remove external __wt_hex symbol.
          WT-3052 Improve search if an index hint is wrong
          WT-3053 Review Python and Java calls to internal WiredTiger functions
          WT-3054 Java PackTest, PackTest03 do not compile
          WT-3055 Java AsyncTest faults
          WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE.
          WT-3064 minor tree cleanups: .gitignore, NEWS misspelling
          Branch: master
          https://github.com/mongodb/mongo/commit/21a6f07d859c132154166bd3d83bbed238d5d719

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Import wiredtiger: 1b6c815a3fd34f14c20d5cd627155799d1de535c from branch mongodb-3.6 ref: ca6eee06ff..1b6c815a3f for: 3.5.1 WT-2336 Add a test validating schema operations via file system call monitoring WT-2670 Add option to configure read-ahead per table and change default behavior WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage WT-2969 Fix a bug that could cause snapshot corruption during compaction WT-3014 Add GCC/clang support for ELF symbol visibility. WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode WT-3025 fix error path in log_force_sync WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check WT-3030 Test failure indicating invalid key order during traversal WT-3034 Add support for single-writer named snapshots. WT-3037 Fix some outdated comments in logging WT-3048 WiredTiger maximum size warning uses the wrong format. WT-3051 Remove external __wt_hex symbol. WT-3052 Improve search if an index hint is wrong WT-3053 Review Python and Java calls to internal WiredTiger functions WT-3054 Java PackTest, PackTest03 do not compile WT-3055 Java AsyncTest faults WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE. WT-3064 minor tree cleanups: .gitignore, NEWS misspelling Branch: master https://github.com/mongodb/mongo/commit/21a6f07d859c132154166bd3d83bbed238d5d719
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2969 Possible snapshot corruption during compaction (#3160)

          • Change compaction to use database-wide checkpoints: rather than processing each file separately and checkpointing after each time we compact a file, do a compaction pass over all of the files then database-wide checkpoints.
          • Save/restore the WT_SESSION_CAN_WAIT, WT_SESSION_NO_EVICTION flags set during checkpoint, the session handle may be used for other tasks in the future.
          • There's no need to hold the checkpoint lock while opening the metadata cursor.
          • When adc1cfb went in (WT-2394, gather handles for compaction first, before doing the actual compaction), we broke data-source support for compaction. Add back in data-source support and simplify the __wt_schema_worker() code, it no longer needs to know about LSM or data-source compaction.
          • Disallow LSM compaction in an existing transaction (LSM didn't check, but there's no reason to special case LSM compaction so it can run in an existing transaction, and it's potentially confusing, or fragile if LSM compaction some day requires checkpoints).
          • Do a checkpoint after removing the key/value pairs, otherwise we might not find anything to work with in our compaction pass.
          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2969 Possible snapshot corruption during compaction (#3160) Change compaction to use database-wide checkpoints: rather than processing each file separately and checkpointing after each time we compact a file, do a compaction pass over all of the files then database-wide checkpoints. Save/restore the WT_SESSION_CAN_WAIT, WT_SESSION_NO_EVICTION flags set during checkpoint, the session handle may be used for other tasks in the future. There's no need to hold the checkpoint lock while opening the metadata cursor. When adc1cfb went in ( WT-2394 , gather handles for compaction first, before doing the actual compaction), we broke data-source support for compaction. Add back in data-source support and simplify the __wt_schema_worker() code, it no longer needs to know about LSM or data-source compaction. Disallow LSM compaction in an existing transaction (LSM didn't check, but there's no reason to special case LSM compaction so it can run in an existing transaction, and it's potentially confusing, or fragile if LSM compaction some day requires checkpoints). Do a checkpoint after removing the key/value pairs, otherwise we might not find anything to work with in our compaction pass. Replace WT_SESSION_LOCK_NO_WAIT with per-lock operation flags. If we set WT_SESSION_LOCK_NO_WAIT and the lock is acquired, then the underlying operation eventually needs to acquire its own locks, having the WT_SESSION_LOCK_NO_WAIT flag set in the session may not be correct. Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/f1152ba768da7e03fbca5131b289f0407565050a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2969 Possible snapshot corruption during compaction (#3160)

          • Change compaction to use database-wide checkpoints: rather than processing each file separately and checkpointing after each time we compact a file, do a compaction pass over all of the files then database-wide checkpoints.
          • Save/restore the WT_SESSION_CAN_WAIT, WT_SESSION_NO_EVICTION flags set during checkpoint, the session handle may be used for other tasks in the future.
          • There's no need to hold the checkpoint lock while opening the metadata cursor.
          • When adc1cfb went in (WT-2394, gather handles for compaction first, before doing the actual compaction), we broke data-source support for compaction. Add back in data-source support and simplify the __wt_schema_worker() code, it no longer needs to know about LSM or data-source compaction.
          • Disallow LSM compaction in an existing transaction (LSM didn't check, but there's no reason to special case LSM compaction so it can run in an existing transaction, and it's potentially confusing, or fragile if LSM compaction some day requires checkpoints).
          • Do a checkpoint after removing the key/value pairs, otherwise we might not find anything to work with in our compaction pass.
          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2969 Possible snapshot corruption during compaction (#3160) Change compaction to use database-wide checkpoints: rather than processing each file separately and checkpointing after each time we compact a file, do a compaction pass over all of the files then database-wide checkpoints. Save/restore the WT_SESSION_CAN_WAIT, WT_SESSION_NO_EVICTION flags set during checkpoint, the session handle may be used for other tasks in the future. There's no need to hold the checkpoint lock while opening the metadata cursor. When adc1cfb went in ( WT-2394 , gather handles for compaction first, before doing the actual compaction), we broke data-source support for compaction. Add back in data-source support and simplify the __wt_schema_worker() code, it no longer needs to know about LSM or data-source compaction. Disallow LSM compaction in an existing transaction (LSM didn't check, but there's no reason to special case LSM compaction so it can run in an existing transaction, and it's potentially confusing, or fragile if LSM compaction some day requires checkpoints). Do a checkpoint after removing the key/value pairs, otherwise we might not find anything to work with in our compaction pass. Replace WT_SESSION_LOCK_NO_WAIT with per-lock operation flags. If we set WT_SESSION_LOCK_NO_WAIT and the lock is acquired, then the underlying operation eventually needs to acquire its own locks, having the WT_SESSION_LOCK_NO_WAIT flag set in the session may not be correct. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/f1152ba768da7e03fbca5131b289f0407565050a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: Import wiredtiger: d48181f6f4db08761ed7b80b0332908b272ad0d0 from branch mongodb-3.2

          ref: 040e3d6f76..d48181f6f4
          for: 3.2.12

          SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers
          WT-2336 Add a test validating schema operations via file system call monitoring
          WT-2402 Misaligned structure accesses lead to undefined behavior
          WT-2670 Inefficient I/O when read full DB (poor readahead)
          WT-283 Add a way to change persistent object settings
          WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage
          WT-2969 Possible snapshot corruption during compaction
          WT-3014 Add GCC/clang support for ELF symbol visibility.
          WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode
          WT-3025 fix error path in log_force_sync
          WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check
          WT-3030 Test failure indicating invalid key order during traversal
          WT-3034 Add support for single-writer named snapshots.
          WT-3037 Fix some outdated comments in logging
          WT-3048 WiredTiger maximum size warning uses the wrong format.
          WT-3051 Remove external __wt_hex symbol.
          WT-3052 Improve search if an index hint is wrong
          WT-3053 Review Python and Java calls to internal WiredTiger functions
          WT-3054 Java PackTest, PackTest03 do not compile
          WT-3055 Java AsyncTest faults
          WT-3056 For cursors with projections, keys should be allowed
          WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE.
          WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64
          WT-3064 minor tree cleanups: .gitignore, NEWS misspelling
          WT-3066 lint
          WT-3068 Copy wtperf artifacts when running Jenkins tests
          WT-3069 Fix build failures in LevelDB APIs
          WT-3070 Fix search_near() for index cursor
          WT-3071 Java: fix build with -Werror=sign-conversion
          WT-3075 Document and enforce that WiredTiger now depends on Python 2.7
          WT-3078 Fix a hang in the reconfiguration test.
          WT-3084 Fix Coverity resource leak complaint.
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/52b68fa86ea43e909ad42c901d0579bced6b205f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: Import wiredtiger: d48181f6f4db08761ed7b80b0332908b272ad0d0 from branch mongodb-3.2 ref: 040e3d6f76..d48181f6f4 for: 3.2.12 SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers WT-2336 Add a test validating schema operations via file system call monitoring WT-2402 Misaligned structure accesses lead to undefined behavior WT-2670 Inefficient I/O when read full DB (poor readahead) WT-283 Add a way to change persistent object settings WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage WT-2969 Possible snapshot corruption during compaction WT-3014 Add GCC/clang support for ELF symbol visibility. WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode WT-3025 fix error path in log_force_sync WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check WT-3030 Test failure indicating invalid key order during traversal WT-3034 Add support for single-writer named snapshots. WT-3037 Fix some outdated comments in logging WT-3048 WiredTiger maximum size warning uses the wrong format. WT-3051 Remove external __wt_hex symbol. WT-3052 Improve search if an index hint is wrong WT-3053 Review Python and Java calls to internal WiredTiger functions WT-3054 Java PackTest, PackTest03 do not compile WT-3055 Java AsyncTest faults WT-3056 For cursors with projections, keys should be allowed WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE. WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64 WT-3064 minor tree cleanups: .gitignore, NEWS misspelling WT-3066 lint WT-3068 Copy wtperf artifacts when running Jenkins tests WT-3069 Fix build failures in LevelDB APIs WT-3070 Fix search_near() for index cursor WT-3071 Java: fix build with -Werror=sign-conversion WT-3075 Document and enforce that WiredTiger now depends on Python 2.7 WT-3078 Fix a hang in the reconfiguration test. WT-3084 Fix Coverity resource leak complaint. Branch: v3.2 https://github.com/mongodb/mongo/commit/52b68fa86ea43e909ad42c901d0579bced6b205f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: Import wiredtiger: 8d2324943364286056ae399043f70b8a937de312 from branch mongodb-3.4

          ref: ca6eee06ff..8d23249433
          for: 3.4.2

          SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers
          WT-2336 Add a test validating schema operations via file system call monitoring
          WT-2402 Misaligned structure accesses lead to undefined behavior
          WT-2670 Inefficient I/O when read full DB (poor readahead)
          WT-283 Add a way to change persistent object settings
          WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage
          WT-2969 Possible snapshot corruption during compaction
          WT-3014 Add GCC/clang support for ELF symbol visibility.
          WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode
          WT-3025 fix error path in log_force_sync
          WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check
          WT-3030 Test failure indicating invalid key order during traversal
          WT-3034 Add support for single-writer named snapshots.
          WT-3037 Fix some outdated comments in logging
          WT-3048 WiredTiger maximum size warning uses the wrong format.
          WT-3051 Remove external __wt_hex symbol.
          WT-3052 Improve search if an index hint is wrong
          WT-3053 Review Python and Java calls to internal WiredTiger functions
          WT-3054 Java PackTest, PackTest03 do not compile
          WT-3055 Java AsyncTest faults
          WT-3056 For cursors with projections, keys should be allowed
          WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE.
          WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64
          WT-3064 minor tree cleanups: .gitignore, NEWS misspelling
          WT-3066 lint
          WT-3068 Copy wtperf artifacts when running Jenkins tests
          WT-3069 Fix build failures in LevelDB APIs
          WT-3070 Fix search_near() for index cursor
          WT-3071 Java: fix build with -Werror=sign-conversion
          WT-3075 Document and enforce that WiredTiger now depends on Python 2.7
          WT-3078 Fix a hang in the reconfiguration test.
          WT-3084 Fix Coverity resource leak complaint.
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/d2c64ac8c526b70eadeb859ec41370a5f03a64aa

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: Import wiredtiger: 8d2324943364286056ae399043f70b8a937de312 from branch mongodb-3.4 ref: ca6eee06ff..8d23249433 for: 3.4.2 SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers WT-2336 Add a test validating schema operations via file system call monitoring WT-2402 Misaligned structure accesses lead to undefined behavior WT-2670 Inefficient I/O when read full DB (poor readahead) WT-283 Add a way to change persistent object settings WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage WT-2969 Possible snapshot corruption during compaction WT-3014 Add GCC/clang support for ELF symbol visibility. WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode WT-3025 fix error path in log_force_sync WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check WT-3030 Test failure indicating invalid key order during traversal WT-3034 Add support for single-writer named snapshots. WT-3037 Fix some outdated comments in logging WT-3048 WiredTiger maximum size warning uses the wrong format. WT-3051 Remove external __wt_hex symbol. WT-3052 Improve search if an index hint is wrong WT-3053 Review Python and Java calls to internal WiredTiger functions WT-3054 Java PackTest, PackTest03 do not compile WT-3055 Java AsyncTest faults WT-3056 For cursors with projections, keys should be allowed WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE. WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64 WT-3064 minor tree cleanups: .gitignore, NEWS misspelling WT-3066 lint WT-3068 Copy wtperf artifacts when running Jenkins tests WT-3069 Fix build failures in LevelDB APIs WT-3070 Fix search_near() for index cursor WT-3071 Java: fix build with -Werror=sign-conversion WT-3075 Document and enforce that WiredTiger now depends on Python 2.7 WT-3078 Fix a hang in the reconfiguration test. WT-3084 Fix Coverity resource leak complaint. Branch: v3.4 https://github.com/mongodb/mongo/commit/d2c64ac8c526b70eadeb859ec41370a5f03a64aa

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                19 weeks, 4 days ago
                Date of 1st Reply:

                  Agile