-
Type:
Bug
-
Status: Open
-
Priority:
Major - P3
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: Backlog
-
Component/s: None
-
Labels:None
/* Ignore prepared updates if it is checkpoint. */ |
if (upd->prepare_state == WT_PREPARE_LOCKED || |
upd->prepare_state == WT_PREPARE_INPROGRESS) {
|
WT_ASSERT(session, upd_select->upd == NULL || upd_select->upd->txnid == upd->txnid);
|
if (F_ISSET(r, WT_REC_CHECKPOINT)) { |
has_newer_updates = true; |
if (upd->start_ts > max_ts) |
max_ts = upd->start_ts;
|
|
/* |
* Track the oldest update not on the page, used to decide whether reads can use the
|
* page image, hence using the start rather than the durable timestamp.
|
*/
|
if (upd->start_ts < r->min_skipped_ts) |
r->min_skipped_ts = upd->start_ts;
|
continue; |
} else { |
/* |
* For prepared updates written to the date store in salvage, we write the same
|
* prepared value to the date store. If there is still content for that key left in
|
* the history store, rollback to stable will bring it back to the data store.
|
* Otherwise, it removes the key.
|
*/
|
WT_ASSERT(session,
|
F_ISSET(r, WT_REC_EVICT) ||
|
(F_ISSET(r, WT_REC_VISIBILITY_ERR) &&
|
F_ISSET(upd, WT_UPDATE_PREPARE_RESTORED_FROM_DS)));
|
WT_ASSERT(session, upd->prepare_state == WT_PREPARE_INPROGRESS);
|
}
|
With the current implementation, checkpoint may see partial resolved prepared updates on the same key and write that to disk.
The detailed scenario is like follow:
Suppose we have the update chain like U_prepared2@10 -> U_prepared1@10
Checkpoint starts
We commit the prepared update and resolve the U_preapred2 to U_committed@11_durable@12.
Context switch happens and we have U_committed@11_durable@12 -> U_prepared1@10 on the update chain.
Checkpoint comes to the page and sees U_committed@11_durable@12 and decide to write it to the disk image.
Checkpoint sees U_prepared1@10 and set has_newer_updates to true but never unsets the update that should be written to disk (U_committed@11_durable@12).
In this case, we write U_committed@11_durable@12 to the data store and U_prepared1@10 to the history store, which is wrong.