Loading...

XML

Word

Printable

JSON

Type: Epic
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Engines - Foundations, Storage Engines - Persistence
Sprint:
SE Foundations - Q4+ Backlog
Story Points:
13
Epic Name:
Extend crash testing framework with a configurable background thread for randomized crash points

Currently, WiredTiger supports crash testing during checkpoints using the WT_SESSION.checkpoint.checkpoint_crash_point field. While this feature has been valuable for reproducing complex bugs involving crashes, it is limited in scope and requires adding crash-specific code to trigger a crash at a specific time. Furthermore, it only crashes between tables and not while checkpoint is a processing a table.

To expand crash testing capabilities and improve the reproducibility of issues across various subsystems, I propose implementing a generic background thread designed for crash testing.

Key Features:

Configurable activation:
- A new connection configuration field would enable/disable the crash testing thread.
Function pointer for crash conditions:
- The thread would be assigned a user-defined function pointer to determine appropriate crash conditions dynamically.
Subsystem-specific crash scenarios:
- For crash testing in the checkpoint subsystem, the thread could periodically check conditions like whether a checkpoint has started, and based on those, trigger crashes after a timer or other rules.
- Conditions could be time-based, event-driven, or a mix of both, depending on testing requirements.
Extensibility:
- The implementation would lay the groundwork for crash scenarios across other subsystems without requiring intrusive code changes in each subsystem.

Benefits:

Increased coverage for crash testing, exposing edge cases across multiple subsystems.
Greater flexibility in designing crash scenarios.
Simplified implementation, reducing the need for subsystem-specific crash code.

Example Use Case:
In the checkpoint subsystem, the thread could detect when a checkpoint begins and schedule a crash at a random point within that process or based on specific stages/events during the checkpoint lifecycle.

Next Steps:

Get this reviewed by the team for feasibility.
Define the connection configuration field to toggle the crash thread.
Implement the background thread and the mechanism for handling the function pointer for crash logic.
Evaluate initial subsystem targets (e.g., checkpoints) for crash testing expansion.

related to

WT-14690 Review test coverage of crash testing and plan a project to improve it

Closed

mentioned in: Page Loading...; Page Loading...

Assignee:: Albert Song
Reporter:: Etienne Petrel
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jul 17 2025 01:05:07 AM UTC
Updated:: Dec 04 2025 06:26:16 AM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates