There have been two recent reports of users who are having time apparently run backward quite frequently. This theory is based on seeing wildly large values in many of the time-related statistics in the diagnostic data. In both cases, checkpoints get hung as it appears the code to scrub the cache gets hung as it uses time to compute some of its conditions.
I believe we need:
1. The WT_TIMEDIFF macros need to check for end < begin and compute a 0 diff in that case. We already have WT_TIMECMP for that.
2. I will review the scrub code and attempt to reproduce the hang by manually forcing time to go backward in the code.
3. Thorough review of places we depend on time differences for more than statistics (such as the timestamp field in a checkpoint in the metadata (reporting epoch seconds) - can something bad happens if the later checkpoint has an earlier value?)
- is depended on by
-
WT-3331 Fix a potential hang if system clock jumped backward
- Closed
- is duplicated by
-
SERVER-29230 Journal files accumulating on ReplicaSet Secondary
- Closed
- is related to
-
SERVER-29102 WiredTiger does not rotate journal log files
- Closed