WT internally introduces an OOO stop timestamp to its start durable timestamp when a history store entry is reinserted again. Following are the steps how the problem can occur:
- Insert a key with an update U1 using a prepared transaction timestamp (commit timestamp - 20, Durable timestamp -30).
- Remove this key with a normal/prepared transaction (commit timestamp - 40).
- Insert the same key again with an update U2 using a prepared transaction (prepare timestamp - 50)
- Evict the page leads to writing the prepared update U2 (50) to the data store and U1 (20 - 40) to the history store.
- Start the checkpoint operation.
- In parallel abort the prepared transaction that performed the U2 update. This leads to bringing back the history store update and adding it to the update chain. But the history store entry is not removed as it can lead to inconsistent checkpoint if the server crash after this checkpoint.
- Now insert the same key with an update U3(60) and commit.
- Checkpointing this page leads to writing U3(60) to the data store and adding back the U1(20-40) to the history store again.
- The history store already has the same update with the start timestamp (20), it triggers the OOO logic to remove entries with a timestamp greater than 20. The existing history store update has a start timestamp of 20 and a stop timestamp of 40. This entry gets reinserted with the incorrect time window as (20-20)
- But this entry has the start durable timestamp of 30 as we compare only the commit timestamps during the OOO handling. This leads to an incorrect time window and it triggered an assertion later during the reconciliation.
We cannot avoid reinserting this update back into the history store as there is no information that is available to find out if this update already exists in the history store or not?
- related to
WT-9268 Delay deletion of the history store record to reconciliation