Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Engines - Foundations
Total Hours with Assigned Team:
1,413.209
Sprint:
None
Story Points:
None

Summary

The key-provider subsystem persists a binary on-disk header (WT_CRYPT_HEADER) inside every KEK page written to disaggregated storage. That header is versioned (version, compatible_version, header_size) so the format can evolve without breaking older readers. No test in the suite exercises the cross-version readback path.

What's Not Covered

No upgrade test: no test writes a KEK page with an older WT_CRYPT_HEADER layout and then verifies a current-build reader correctly parses it (including handling a smaller header_size and treating post-WT_CRYPT_HEADER_MIN_SIZE fields as absent).
No downgrade test: no test writes a KEK page with the current layout and verifies an older binary, built with knowledge only of compatible_version, can still read it.
No mixed-version coexistence test: no test exercises a single disaggregated page log containing KEK pages from multiple binary versions side-by-side — the realistic upgrade/downgrade scenario for a live system.
No rotation-during-format test: existing format runs technically configure the key provider, but a separate format bug (~~WT-17691~~) means only one KEK page is ever written per run. Even today's format coverage does not stress repeated rotation against the on-disk header format.

Why This Matters

Every version bump of WT_CRYPT_HEADER introduces three independent failure modes not caught by existing tests:

Field-layout drift: a future PR moves or resizes a field; older readers using WT_CRYPT_HEADER_MIN_SIZE to clamp their view silently misinterpret bytes.
compatible_version regressions: a writer bumps version without leaving compatible_version at a level older binaries actually support. Nothing automatically fails — older binaries reject checkpoints in production.
Padding/alignment surprises across header_size boundaries: a field added between the current WT_CRYPT_HEADER_MIN_SIZE and a future minimum changes alignment when older readers truncate to their compile-time size.

The Catch2 unit tests in test/catch2/ext/test_key_provider_header.cpp cover the validator's logic against synthetic headers, which is necessary but not sufficient. A synthetic test can show that a given byte pattern is parsed correctly; it cannot show that two real binaries built at different points in history agree on the byte pattern.

What the Missing Test Should Look Like

A proper cross-version test needs to run real binaries against the same disaggregated storage:

Seed a RUNDIR using a binary from the previous release (or develop) with disagg.key_provider=1, running long enough to trigger multiple KEK rotations.
Run the current branch's binary in -R (reopen) mode against that RUNDIR, rotating keys and writing its own KEK pages on top.
Run the previous-release binary again against the now-mixed RUNDIR.
Optionally reverse the order to cover the downgrade direction explicitly.

This is the gap that a new evergreen task in test/evergreen.yml should close: an automated, multi-binary, mixed-RUNDIR exercise that runs on every disagg PR.

Why the Absence Went Unnoticed

The validator's unit tests gave a false sense of coverage — they exercise the parser, not the contract between binary versions.
The format integration test technically configured the key provider but (due to ~~WT-17691~~) only ever wrote one KEK page per run, so the on-disk header was always the writer's own version.
Disaggregated storage is new enough that the discipline around "versioned on-disk formats require cross-version evergreen tasks" has not yet been established.

is related to

WT-17691 Fix early-load extensions on wts_open and add guardrails

Closed

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Jie Chen
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 29 2026 06:52:16 AM UTC
Updated:: Jun 04 2026 05:28:26 AM UTC

Details

Description

Summary

What's Not Covered

Why This Matters

What the Missing Test Should Look Like

Why the Absence Went Unnoticed

Attachments

Issue Links

Activity

People

Dates