-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: History Store
-
Storage Engines
-
8
-
StorEng - 2025-02-04
This ticket is the result of investigating BF-35359 (it is not causing the failure seen in the BF but was caught when we were investigating the BF).
I was able to reproduce the issue in a Python test. In the python test below it was observed that the HS data for a table excluded from the selective backup is retained due to an interaction between the drop table operation, selective backup, and the RTS.
Reproducer:
# Create 3 tables. self.session.create(self.uri, "key_format=S,value_format=S") self.session.create(self.newuri, "key_format=S,value_format=S") self.session.create(self.newuri1, "key_format=S,value_format=S") # Add data to the tables. self.add_timestamp_data(self.uri, "key", "val", 1) self.add_timestamp_data(self.newuri, "key", "val", 1) self.add_timestamp_data(self.newuri1, "key", "val", 1) # Add updates to the same records. self.add_timestamp_data(self.uri, "key", "val5", 5) self.add_timestamp_data(self.newuri, "key", "val5", 5) self.add_timestamp_data(self.newuri1, "key", "val", 5) self.session.checkpoint() # Drop one of the tables. - 'newuri1'. Now this entry gets removed from the metadata. # Only newuri and uri should be present in the metadata. self.session.begin_transaction() self.session.drop(self.newuri1) self.session.commit_transaction('commit_timestamp=' + self.timestamp_str(7)) # Stable timestamp at 10, so that we can retain history store data. self.conn.set_timestamp('stable_timestamp=' + self.timestamp_str(10)) self.session.checkpoint() os.mkdir(self.dir) # Now copy the files using selective backup. This should not include one of the tables. # We only wish to have `uri` present in the selective db, don't consider the others (we dont have to include self.newuri1_file here) all_files = self.take_selective_backup(self.dir, [self.newuri_file, self.newuri1_file]) # After the full backup, open and partially recover the backup database on only one table. backup_conn = self.wiredtiger_open(self.dir, "backup_restore_target=[\"{0}\"]".format(self.uri)) bkup_session = backup_conn.open_session() # In the history store data still exists for the `newuri1` table that was not included in the backup. # Ideally, the history store data should be removed for the table that was not included in the backup. # However, since we dropped the table, and the metadata entry got removed, there is no way to know if # the history store data for that table needs to be removed. So, the history store data is retained.
Issue in the python test:
History store data for newuri1 (excluded from the backup) was still present. Since the table’s metadata entry was removed (during the drop of newuri1), the system cannot determine that its history store data should be removed. This resulted in the retention of HS data for an excluded table.
Scope:
- List proposed solutions for this issue.
- Fix the issue as part of this ticket.