-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
StorEng - Defined Pipeline
Test/format predictable replays get stuck with ops.throttling enabled.
The exact mechanism how they interact is however unclear. Some current considerations:
- "Predictable Replays" differ from "regular" ones in a number of ways:
- If an operation fails, it's not discarded but rather retried until succeeded.
- However, throttling is implemented outside of retry loop.
- In Predictable Replays, there is a special time keeping thread managing the current clock for all threads. If a thread runs ahead of time, it will block and wait for the time to catch up.
- This mechanism should not be affected by throttling.
- donald.anderson@mongodb.com: More precisely, test/format has a timestamp thread that moves the stable timestamp and oldest timestamp according to the operation in progress that is the oldest. This thread is present for all test/format runs (not just predictable).
- If an operation fails, it's not discarded but rather retried until succeeded.
- In Evergreen tests, each predictable replay iteration runs 3 times. When the first run is completed by time out, the script reads the resulting number of operations and timestamp and passes these to the following 2 runs. The resulting database states are then compared across.
- When a "Predictable Replay" times out, it does it on the first run. It's unclear why it's not able to terminate after 3-minute timeout within 15 minutes. All Wiredtiger internal threads are in their idle states at the time of termination.
Because the reason for this is unknown, it can indicate a bug in "Predictable Replay" or in WiredTiger.
Also see comments on WT-11765.