Due to the difference in global visibility between when the checkpoint visited the btree and before it finishes the history store leads to wrong data to be written to the disk when the oldest timestamp moves ahead of the checkpoint timestamp.
Consider a following scenario:
1. Oldest timestamp is 10 and the stable timestamp is 10.
2. Page A has a key (1000) from timestamp 20.
3. Checkpoint is started at stable timestamp 10
4. Checkpoint has finished on page A and wrote the keys to disk with timestamp 20.
5. Later page A is modified again for another key (2000) at timestamp 30
6. The oldest and stable timestamps are moved to 30
7. Later eviction triggered on page A and wrote again the new image to disk and the key(1000) at timestamp 20 are rewritten to the disk with no timestamp because 20 is less than 30.
8. Update the key (1000) again with another update with timestamp 50.
9. Eviction triggered on this page again, writes the update at 50 to the data store and write the update at timestamp 20 is history store. Note that we cleared the timestamp due to global visibility.
The checkpoint stable timestamp is 939. But the same update is written to the history store with start timestamp as zero due to the above described problem.
On these checkpoint data files, if the RTS occurs, it restores the key that it shouldn't.