-
Type:
Improvement
-
Resolution: Won't Do
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Catalog and Routing
-
None
-
None
-
None
-
None
-
None
-
None
-
None
During the work on authoritative shards, we have added several tasserts / invariants that prove our model is correct and works as expected.
If any of those are triggered, due to an unknown bug or a rare data race, we should react to it and fail gracefully, rather than surfacing an internal error message to the user.
The idea: if the in-memory state is corrupted and hits any invariant during the commit sequence of a DDL or refresh, we should fallback to clearing the in-memory state and try to recover from disk again, assuming the on-disk state is correct and aligned with the CSRS. (If it isn't, this is another case of a bug that we should detect with checkMetadataConsistency and that requires manual intervention.) Apart from this fallback, we should tassert anyway, to trigger AF assertions on Atlas that will warn about the violation of our contract.