-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Test Csuite
-
None
-
Storage Engines - Foundations
-
SE Foundations - 2026-03-13, SE Foundations - 2026-03-27
-
5
On a follower, the ingest table (theoretically) only exists in memory. This makes it hard, when we notice that something is wrong, to debug the data in the system. Consider a test format run in "multi-mode" - using a predictable set of operations on both leader and follower processes. While continuing the operations, the leader creates a checkpoint, and follower picks it up. Both sides stop after applying N operations and want to compare.
The leader and follower can both do in memory scans of their entire dataset and produce a checksum. Suppose they are different - how do we narrow down the differences to a single differing key? (If the differences are many, the first differing key will do).
This is currently an unsolved problem. We could checkpoint both processes and then have a process with multiple WT connections to compare files (tools/wt_cmp_dir is such a script). Checkpointing puts everything into disagg or on disk, right? Oops, the ingest table has no disk presence. So of course, lots of stuff might appear different (that shouldn't be), or if we stopped at the checkpoint we just picked up, both sides would appear the same (even if the ingest table was different).
We need a way to identify and examine the differences in layered tables from two different active connections.
- related to
-
WT-16483 test/format (multi-node disagg) data mismatch b/w leader and follower
-
- Blocked
-