When a customer encounters a data corruption issue, the TSE uses the available information to determine the bucket of corruption (storage HW, index vs collection mismatch, know CA, etc).
TS currently labels all data corruption as "Physical Data Consistency & Corruption". This includes corruption on the storage HW which should be tracked separately (tracked, but it pollutes the signal of corruption we can investigate). This ticket is to ensure that WT log messages clearly identify if the corruption is storage HW related so the TSE can mark it correctly. Once the TSE can identify it, they can use a different tag for the issue and the frequency can be measured accordingly.
- Does this affect any team outside of WT?
Yes, TSE. This is not blocking them. This is an improvement where helping TS helps WT.
- How likely is it that this use case or problem will occur?
Not frequent. But when it happens, its important.
- If the problem does occur, what are the consequences and how severe are they?
Miscounting of metric but also mis-identifying the priority.
- Is this issue urgent?
Somewhat urgent. The sooner it is correct, the sooner we get better data and know the priority of issues.
Acceptance Criteria (Definition of Done)
This ticket is complete when we have clear log messages that are in a KB in TS for identifying storage HW level corruption (e.g. checksum mismatch).
Unit and functional