-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines, Storage Engines - Foundations, Storage Engines - Persistence
-
SE Foundations - Q3+ Backlog
-
13
Currently, WiredTiger supports crash testing during checkpoints using the WT_SESSION.checkpoint.checkpoint_crash_point field. While this feature has been valuable for reproducing complex bugs involving crashes, it is limited in scope and requires adding crash-specific code to trigger a crash at a specific time. Furthermore, it only crashes between tables and not while checkpoint is a processing a table.
To expand crash testing capabilities and improve the reproducibility of issues across various subsystems, I propose implementing a generic background thread designed for crash testing.
Key Features:
- Configurable activation:
- A new connection configuration field would enable/disable the crash testing thread.
- Function pointer for crash conditions:
- The thread would be assigned a user-defined function pointer to determine appropriate crash conditions dynamically.
- Subsystem-specific crash scenarios:
- For crash testing in the checkpoint subsystem, the thread could periodically check conditions like whether a checkpoint has started, and based on those, trigger crashes after a timer or other rules.
- Conditions could be time-based, event-driven, or a mix of both, depending on testing requirements.
- Extensibility:
- The implementation would lay the groundwork for crash scenarios across other subsystems without requiring intrusive code changes in each subsystem.
Benefits:
- Increased coverage for crash testing, exposing edge cases across multiple subsystems.
- Greater flexibility in designing crash scenarios.
- Simplified implementation, reducing the need for subsystem-specific crash code.
Example Use Case:
In the checkpoint subsystem, the thread could detect when a checkpoint begins and schedule a crash at a random point within that process or based on specific stages/events during the checkpoint lifecycle.
Next Steps:
- Get this reviewed by the team for feasibility.
- Define the connection configuration field to toggle the crash thread.
- Implement the background thread and the mechanism for handling the function pointer for crash logic.
- Evaluate initial subsystem targets (e.g., checkpoints) for crash testing expansion.
- related to
-
WT-14690 Review test coverage of crash testing and plan a project to improve it
-
- Closed
-