-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Backup
-
Storage Engines
-
5
-
Security 2024-11-11
-
v8.0, v7.0, v6.0
This ticket is the result of BF-35359.
Explanation of the Bug:
This issue involves selective backup, RTS, fast truncate, and reconciliation.
During a selective backup, recovery is called to create fresh metadata for the selectively backed-up files. Since the history store is copied as-is, it needs to be truncated to contain only the data relevant to the selective backup, discarding everything else. To achieve this, recovery call RTS to truncate HS down to only relevant data.
There are two types of truncate operations: fast truncate and normal truncate. Fast truncate operates at the page level rather than the key level, which is relevant to this issue. When irrelevant HS records are fast-truncated on a page, a flag and a structure called page_del are set. This page_del structure holds details like whether the truncate operation was committed and the transaction that committed it. However, since the HS is non-transactional, none of this information is populated, meaning page_del is allocated for HS but remains empty.
During reconciliation, the data in page_del is used to decide whether to permanently remove a page or retain the original undeleted page. For instance, if page_del->committed is false, indicating a rollback, the page would be retained rather than removed. However, the bug here is that for HS (which is non-transactional), page_del->committed is always false, meaning that every time an HS page is fast-truncated as part of selective backup, it is never actually removed. This leaves data in the HS that can lead to data corruption issues, as seen in this ticket.
Scope:
- Write a reproducer for this bug in python
- Fix the bug