-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines - Foundations
-
686.729
-
None
-
None
Summary
The key-provider subsystem persists a binary on-disk header (WT_CRYPT_HEADER) inside every KEK page written to disaggregated storage. That header is versioned (version, compatible_version, header_size) so the format can evolve without breaking older readers. No test in the suite exercises the cross-version readback path.
What's Not Covered
- No upgrade test: no test writes a KEK page with an older WT_CRYPT_HEADER layout and then verifies a current-build reader correctly parses it (including handling a smaller header_size and treating post-WT_CRYPT_HEADER_MIN_SIZE fields as absent).
- No downgrade test: no test writes a KEK page with the current layout and verifies an older binary, built with knowledge only of compatible_version, can still read it.
- No mixed-version coexistence test: no test exercises a single disaggregated page log containing KEK pages from multiple binary versions side-by-side — the realistic upgrade/downgrade scenario for a live system.
- No rotation-during-format test: existing format runs technically configure the key provider, but a separate format bug (
WT-17691) means only one KEK page is ever written per run. Even today's format coverage does not stress repeated rotation against the on-disk header format.
Why This Matters
Every version bump of WT_CRYPT_HEADER introduces three independent failure modes not caught by existing tests:
- Field-layout drift: a future PR moves or resizes a field; older readers using WT_CRYPT_HEADER_MIN_SIZE to clamp their view silently misinterpret bytes.
- compatible_version regressions: a writer bumps version without leaving compatible_version at a level older binaries actually support. Nothing automatically fails — older binaries reject checkpoints in production.
- Padding/alignment surprises across header_size boundaries: a field added between the current WT_CRYPT_HEADER_MIN_SIZE and a future minimum changes alignment when older readers truncate to their compile-time size.
The Catch2 unit tests in test/catch2/ext/test_key_provider_header.cpp cover the validator's logic against synthetic headers, which is necessary but not sufficient. A synthetic test can show that a given byte pattern is parsed correctly; it cannot show that two real binaries built at different points in history agree on the byte pattern.
What the Missing Test Should Look Like
A proper cross-version test needs to run real binaries against the same disaggregated storage:
- Seed a RUNDIR using a binary from the previous release (or develop) with disagg.key_provider=1, running long enough to trigger multiple KEK rotations.
- Run the current branch's binary in -R (reopen) mode against that RUNDIR, rotating keys and writing its own KEK pages on top.
- Run the previous-release binary again against the now-mixed RUNDIR.
- Optionally reverse the order to cover the downgrade direction explicitly.
This is the gap that a new evergreen task in test/evergreen.yml should close: an automated, multi-binary, mixed-RUNDIR exercise that runs on every disagg PR.
Why the Absence Went Unnoticed
- The validator's unit tests gave a false sense of coverage — they exercise the parser, not the contract between binary versions.
- The format integration test technically configured the key provider but (due to
WT-17691) only ever wrote one KEK page per run, so the on-disk header was always the writer's own version. - Disaggregated storage is new enough that the discipline around "versioned on-disk formats require cross-version evergreen tasks" has not yet been established.
- is related to
-
WT-17691 Fix early-load extensions on wts_open and add guardrails
-
- Closed
-