Add DisaggCorruptionMixin to inject palite page corruption from Python tests

XMLWordPrintableJSON

    • Type: Sub-task
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0, 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: Test Python
    • Security Level: Public (Available to anyone on the web)
    • None
    • Storage Engines, Storage Engines - Persistence
    • 0.003
    • SE Persistence backlog
    • None

      Context

      The Python suite has no helper for producing a corrupt disaggregated database. A test that wants to exercise a corrupt turtle, a corrupt metadata page, a corrupt leaf, a missing page, or a truncated delta chain has to drop into raw SQLite against palite's per-shard pages_<shard>.db files inline.

      Palite holds an exclusive SQLite lock while the WT connection is open, so writes from a Python test must close the WT connection, open the shard DB read-write, run a single statement, and reopen WT. Writes also need to go through the sqlite3 binary built next to wt (wt_builddir) rather than the system binary, to avoid version skew with the SQLite statically linked into palite.

      Palite's schema is fixed at ext/page_log/palite/palite.cpp:1510:

      • Primary key (table_id, page_id, lsn).
      • Payload column page_data BLOB.
      • Flags column with bits WT_PAGE_LOG_DELTA = 0x2 and WT_PAGE_LOG_DISCARDED = 0x10000 (static-asserted at palite.cpp:1506-1507).

      Existing helpers to reuse:

      • get_shard_id at test/suite/helpers/helper_disagg.py:71 — maps table_id to shard.
      • get_table_id at test/suite/helpers/metadata_helper.py:40 — maps URI to table_id.

      Motivation

      Without a shared helper, every author writing a corrupt-state Python test reinvents the same close-conn / sqlite-UPDATE / reopen-conn dance. The result is duplicated logic and tests that drift in how they target rows. A single mixin keeps every corruption helper consistent with palite's schema and lock semantics.

      Out of scope

      • Tests against any consuming wt subcommand. Those live with each subcommand's own ticket and call into this mixin.
      • A non-palite implementation of the helpers.

      Examples

      Sketch of corrupt_page_image:

      def corrupt_page_image(self, table_id, page_id, lsn=None):
          db = os.path.join(self.home, 'kv_home',
                            f'pages_{get_shard_id(table_id):02d}.db')
          # Close the WT connection so palite releases its SQLite lock.
          self.close_conn()
          try:
              sql = ("UPDATE pages "
                     "SET page_data = substr(page_data, 1, 0) || char(0xff) "
                     "                || substr(page_data, 2) "
                     "WHERE table_id=? AND page_id=? "
                     "  AND lsn = COALESCE(?, "
                     "    (SELECT MAX(lsn) FROM pages "
                     "     WHERE table_id=? AND page_id=?))")
              subprocess.run([os.path.join(wt_builddir, 'sqlite3'), db, sql,
                              str(table_id), str(page_id),
                              '' if lsn is None else str(lsn),
                              str(table_id), str(page_id)],
                             check=True)
          finally:
              self.reopen_conn()
          return (page_id, lsn)
      

            Assignee:
            Sean Watt
            Reporter:
            Sean Watt
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: