Fallback mechanism if corrupted state is found in CSS

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Won't Do
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Catalog and Routing
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During the work on authoritative shards, we have added several tasserts / invariants that prove our model is correct and works as expected.

      If any of those are triggered, due to an unknown bug or a rare data race, we should react to it and fail gracefully, rather than surfacing an internal error message to the user.

      The idea: if the in-memory state is corrupted and hits any invariant during the commit sequence of a DDL or refresh, we should fallback to clearing the in-memory state and try to recover from disk again, assuming the on-disk state is correct and aligned with the CSRS. (If it isn't, this is another case of a bug that we should detect with checkMetadataConsistency and that requires manual intervention.) Apart from this fallback, we should tassert anyway, to trigger AF assertions on Atlas that will warn about the violation of our contract.

            Assignee:
            Unassigned
            Reporter:
            Pol Pinol
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: