POC: reuse PIT logic to issue internal "repair" writes from WiredTiger

XMLWordPrintableJSON

    • Type: Sub-task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Not Applicable
    • None
    • Storage Engines, Storage Engines - Persistence
    • 0.003
    • SE Persistence backlog
    • None

      Motivation

      Parent WT-17151 calls for a limited, critical "spot-fix" write path inside WiredTiger — for example, rewriting a value in the metadata file (such as a colliding table ID) without going through the regular user-facing write API. We want to know whether the point-in-time (PIT) write logic already in the disagg branch can be reused as the mechanism for those repair writes, instead of building a new internal write path from scratch.

      Goal

      Build a proof-of-concept that demonstrates a WT-internal "repair" function can perform durable writes via the existing PIT logic when invoked through the normal MongoDB server -> WT API path.

      Scope of work

      • Identify a suitable entry point inside WT (e.g. salvage / a dedicated repair worker) where a repair function can be hooked in on the disagg branch.
      • Wire that entry point to perform an internal write reusing the PIT logic (initially via __wt_schema_create, then extend to a modify of an existing table's metadata entry, e.g. changing a file ID).
      • Drive the path end-to-end from a MongoDB jstest (e.g. jstests/disagg_storage/magic_restore_clean_params.js) so the full server -> WT path is exercised, not just a unit test.
      • Verify the resulting on-disk artifacts: confirm the .wt file is created with valid headers, and that the metadata-edit variant produces a readable, consistent btree.
      • Capture findings (what worked, locking considerations, API gaps, blockers) in a comment on this ticket so the team can decide whether to productionize the approach.

      Out of scope

      • Productionizing the repair API, adding configuration knobs, or shipping it on a release branch.
      • Repairing arbitrary corruption beyond the metadata / file-ID case.
      • Performance, error injection, or stress coverage.

      Definition of Done

      • A WT-internal function on the disagg branch performs a write through PIT logic when invoked via the standard server -> WT path.
      • Both create-new-table and modify-existing-metadata variants have been demonstrated end-to-end from a jstest with exit 0.
      • Resulting files have been validated (header magic, btree readable via the read-path tool / wt CLI).
      • A summary comment is posted on this ticket covering: viability verdict, locking model used, any API limitations found, and recommended next steps for a real repair API.
      • No code from the POC is intended to merge as-is; the branch/diff is linked in the ticket for reference.

            Assignee:
            Jasmine Bi
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: