Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3327

Checkpoints can hang if time runs backward

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.3, 3.2.17, 3.4.6, 3.5.9
    • Labels:
      None
    • Sprint:
      Storage 2017-05-29
    • Backport Requested:
      v3.2

      Description

      There have been two recent reports of users who are having time apparently run backward quite frequently. This theory is based on seeing wildly large values in many of the time-related statistics in the diagnostic data. In both cases, checkpoints get hung as it appears the code to scrub the cache gets hung as it uses time to compute some of its conditions.

      I believe we need:
      1. The WT_TIMEDIFF macros need to check for end < begin and compute a 0 diff in that case. We already have WT_TIMECMP for that.
      2. I will review the scrub code and attempt to reproduce the hang by manually forcing time to go backward in the code.
      3. Thorough review of places we depend on time differences for more than statistics (such as the timestamp field in a checkpoint in the metadata (reporting epoch seconds) - can something bad happens if the later checkpoint has an earlier value?)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sue.loverso Sue LoVerso
                Reporter:
                sue.loverso Sue LoVerso
              • Votes:
                1 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: