I have been running a new checkpoint test that verifies checkpoints by populating multiple tables with the same data. Then periodically checkpointing and verifying that all files have identical content.
The test code is currently in pull request
WT-959 - which is against the checkpoint-optimizations branch, but this issue is reproducible in the develop branch.
The issue manifests as an error from the test code of the form:
This means that one table has key 1752007 from a, and the other got a 1934748. The expectation is that both values are the same.
After some debugging it appears as though:
- The checkpoint on disk is also "missing" the data
- The data is present in the table if read without a checkpoint
- The same error occurs whether using a named or default checkpoint
- There are no obvious memory corruptions leading to this issue
- Looking at the internal tree structure in memory - it doesn't appear corrupted (this is also repeatable if opening another cursor on the same table/checkpoint, you'll see identical data)
- This issue doesn't appear to be present in the 2.1.2 release - pointing towards a change in the new-split branch.