-
Type:
Technical Debt
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines
-
None
-
None
The goal of this ticket is to design a strategy for evolving binary formats in WiredTiger while preserving cross-version compatibility and ensuring robust, error-tolerant handling.
1. Motivation
WiredTiger currently delays or avoids implementing features that require binary format changes due to backward/forward compatibility concerns. As a result, promising enhancements are postponed indefinitely.
Two main problems arise when binary formats change:
- Files written in newer formats become unreadable by older versions, leading to crashes if they are attempted to be read.
- Even if older versions can parse the data, correctness may not be guaranteed (e.g., they might read but not write, or misinterpret features).
Some minor format changes have been possible by exploiting specific access patterns (e.g., appending to checkpoint data), but a more systematic and safe approach is needed.
2. Goals
Define a strategy for evolving large on-disk binary structures in WiredTiger (e.g., file headers, page headers, checkpoint metadata) that ensures:
- Safe reading by both newer and older WiredTiger versions, without crashes.
- Ability to detect the version of WiredTiger that created the data or the set of features it includes.
- Ability for a version of WiredTiger to determine whether it can read or write a given file of a particular version.
- Acceptable performance, though not necessarily identical to current fastest methods.
NON-Goal: This strategy will NOT apply to performance-critical or frequently accessed structures (e.g., items within a leaf page).
3. Expected Deliverables
- A documented format or schema supporting feature/version detection.
- Guidelines for modifying binary layouts. List the structures affected by this change.
- Compatibility matrix or validation logic to identify readable/writable states. This needs to be maintained in future.
- Prototype implementation or simulation for evaluation.
- is related to
-
WT-14644 Use extensible address cookies for disaggregated storage and beyond
-
- Needs Scheduling
-
- related to
-
WT-10307 Enable point in time truncate functionality for MongoDB
-
- Closed
-
-
WT-7408 API to return row and byte counts for objects and cursor ranges
-
- Backlog
-
-
WT-10457 Modify data format to support statistics cursor for byte and record counts
-
- Backlog
-
-
WT-11631 Extensible, future-proof on-disk format
-
- Open
-
-
WT-10139 Add record count field for table
-
- Closed
-
-
WT-14613 Move block header to beginning of page images
-
- Needs Scheduling
-