Missing cross-version compatibility tests for WT_CRYPT_HEADER / key provider format

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines - Foundations
    • 686.729
    • None
    • None

      Summary

      The key-provider subsystem persists a binary on-disk header (WT_CRYPT_HEADER) inside every KEK page written to disaggregated storage. That header is versioned (version, compatible_version, header_size) so the format can evolve without breaking older readers. No test in the suite exercises the cross-version readback path.

      What's Not Covered

      • No upgrade test: no test writes a KEK page with an older WT_CRYPT_HEADER layout and then verifies a current-build reader correctly parses it (including handling a smaller header_size and treating post-WT_CRYPT_HEADER_MIN_SIZE fields as absent).
      • No downgrade test: no test writes a KEK page with the current layout and verifies an older binary, built with knowledge only of compatible_version, can still read it.
      • No mixed-version coexistence test: no test exercises a single disaggregated page log containing KEK pages from multiple binary versions side-by-side — the realistic upgrade/downgrade scenario for a live system.
      • No rotation-during-format test: existing format runs technically configure the key provider, but a separate format bug (WT-17691) means only one KEK page is ever written per run. Even today's format coverage does not stress repeated rotation against the on-disk header format.

      Why This Matters

      Every version bump of WT_CRYPT_HEADER introduces three independent failure modes not caught by existing tests:

      • Field-layout drift: a future PR moves or resizes a field; older readers using WT_CRYPT_HEADER_MIN_SIZE to clamp their view silently misinterpret bytes.
      • compatible_version regressions: a writer bumps version without leaving compatible_version at a level older binaries actually support. Nothing automatically fails — older binaries reject checkpoints in production.
      • Padding/alignment surprises across header_size boundaries: a field added between the current WT_CRYPT_HEADER_MIN_SIZE and a future minimum changes alignment when older readers truncate to their compile-time size.

      The Catch2 unit tests in test/catch2/ext/test_key_provider_header.cpp cover the validator's logic against synthetic headers, which is necessary but not sufficient. A synthetic test can show that a given byte pattern is parsed correctly; it cannot show that two real binaries built at different points in history agree on the byte pattern.

      What the Missing Test Should Look Like

      A proper cross-version test needs to run real binaries against the same disaggregated storage:

      1. Seed a RUNDIR using a binary from the previous release (or develop) with disagg.key_provider=1, running long enough to trigger multiple KEK rotations.
      2. Run the current branch's binary in -R (reopen) mode against that RUNDIR, rotating keys and writing its own KEK pages on top.
      3. Run the previous-release binary again against the now-mixed RUNDIR.
      4. Optionally reverse the order to cover the downgrade direction explicitly.

      This is the gap that a new evergreen task in test/evergreen.yml should close: an automated, multi-binary, mixed-RUNDIR exercise that runs on every disagg PR.

      Why the Absence Went Unnoticed

      • The validator's unit tests gave a false sense of coverage — they exercise the parser, not the contract between binary versions.
      • The format integration test technically configured the key provider but (due to WT-17691) only ever wrote one KEK page per run, so the on-disk header was always the writer's own version.
      • Disaggregated storage is new enough that the discipline around "versioned on-disk formats require cross-version evergreen tasks" has not yet been established.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Jie Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: