We already have unit tests to check whether our verification can detect corruptions, but they currently work only for regular tables, as they rely on writing invalid data chains to local files.
This approach doesn’t work for DisAgg shared tables since they are stored in PALS, meaning no data is stored locally.
I currently have two ideas for how to implement this test case, both based on calling the PALM interface during testing, locating the page that should be corrupted, and overwriting it with invalid content (e.g., filling it with zeroes).
The first approach is to use the PALM Python wrapper. A preliminary algorithm might look like this:
- Read the metadata to extract the table_id.
- Use this table_id to open the PL dhandle (via pl_open_handle()).
- Open the checkpoint (possibly via pl_get_open_checkpoint()?).
- The unclear part: use plh->put() and plh->get() to somehow overwrite the targeted page with corrupted content.
The second approach is similar but involves writing a custom public C function in PALM that accepts a table_id and a page identifier, and then overwrites the specified page with the given content. This feels a bit more intrusive, as it requires exposing API functionality purely for testing purposes.
- related to
-
WT-14908 Disagg verification testing - investigate the possibility of reusing the existing tests
-
- In Progress
-