-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Block Manager
-
None
-
Storage Engines - Persistence
-
33.796
-
SE Persistence backlog
-
None
AF-16606 aborted in __wti_block_disagg_write_internal for file:collection-cd5191a4-...wt, a plain .wt (non-disagg) file that should never have been routed to the disagg block manager. WT-17301 is investigating why these files exist; this ticket covers how, once one exists, the routing layer lets it reach the disagg block manager and propagate to followers.
Gates to audit:
{{}}
- __wt_block_disagg_manager_owns_object: only checks page_log != NULL. No suffix, WT_BTREE_DISAGGREGATED, or conn-is-disagg check. Source comment flags this.
- __btree_setup_page_log: attaches a page_log whenever WT_BTREE_DISAGGREGATED is set, which fires on .wt_stable suffix OR block_manager=disagg config. FIXME-WT-14721.
- __wti_block_disagg_checkpoint_resolve: accepts .wt names ("can happen in tests") and publishes them to shared metadata, propagating misrouting to followers.
- __disagg_apply_checkpoint_meta: follower replays shared metadata blindly; no validation that {{file:* keys are .wt_stable. FIXME-WT-14730.
- BM cast: (WT_BLOCK_DISAGG *)bm->block is unchecked. FIXME-WT-15564 + TODO at block_disagg_open.c.
Proposed Approach:
- Tighten __wt_block_disagg_manager_owns_object (require .wt_stable and/or WT_BTREE_DISAGGREGATED and/or conn-is-disagg).
- Reject non .wt_stable in __wti_block_disagg_checkpoint_resolve.
- Validate file:* keys at follower replay.
- Pre-abort dump in __wti_block_disagg_write_internal (dhandle, flags, page_log, leader/follower, recon context). This would have made AF-16606 triagable from logs.