-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Statistics
-
None
-
Storage Engines - Persistence
-
None
-
None
The idea is to investigate if we can track how often a sub system fails.
We do have specific stats (i.e cache_evict_split_failed_lock, cache_eviction_blocked_multi_block_reconciliation_during_checkpoint) when a function fails at a verify specific location in the code. However, can we track how many failures we have in a sub system? This could help us know is a sub system is fragile, gets errors often but handle them gracefully. It would also help us track down when an issue occurred and its time to detection.