Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9607

Fix timing of compact progress messages

    • 3
    • c(3x10^8)-StorEng - 2023-11-14

      When WT_SESSION::compact() runs with verbose=[compact_progress:1], it tries to generate a progress message roughly every 20 seconds. But in some circumstances it can go a long time without reporting progress and then output multiple progress messages per second.

      Here is example output from MongoDB attempting to compact a 160GB index table. (I've condensed the MongoDB log messages to only include the timestamp and WT log message.)

      2022-07-02T00:01:42.059+00:00: ============ testing for compaction
      2022-07-02T00:01:42.059+00:00: file size 160587MB (168388661248) with 24% space available 39573MB (41495416832)
      2022-07-02T00:01:43.605+00:00:  0%:         1292MB, (free: 1355661312B, 3%), (used: 14765MB, 15483204813B, 12%)
      2022-07-02T00:01:43.605+00:00: 10%:         1755MB, (free: 1841027584B, 4%), (used: 14303MB, 14997838541B, 11%)
      2022-07-02T00:01:43.605+00:00: 20%:         2583MB, (free: 2708723200B, 6%), (used: 13475MB, 14130142925B, 11%)
      2022-07-02T00:01:43.605+00:00: 30%:         2654MB, (free: 2783047680B, 6%), (used: 13404MB, 14055818445B, 11%)
      2022-07-02T00:01:43.605+00:00: 40%:         2193MB, (free: 2299588608B, 5%), (used: 13865MB, 14539277517B, 11%)
      2022-07-02T00:01:43.605+00:00: 50%:         2986MB, (free: 3132018688B, 7%), (used: 13071MB, 13706847437B, 10%)
      2022-07-02T00:01:43.605+00:00: 60%:         4219MB, (free: 4424004096B, 10%), (used: 11839MB, 12414862029B, 9%)
      2022-07-02T00:01:43.605+00:00: 70%:         6071MB, (free: 6366371328B, 15%), (used: 9987MB, 10472494797B, 8%)
      2022-07-02T00:01:43.605+00:00: 80%:         6576MB, (free: 6896386048B, 16%), (used: 9481MB, 9942480077B, 7%)
      2022-07-02T00:01:43.605+00:00: 90%:         9239MB, (free: 9688588288B, 23%), (used: 6819MB, 7150277837B, 5%)
      2022-07-02T00:01:43.738+00:00: index-15--2947465233479438924.wt: total reviewed 0 pages, total skipped 0 pages, total wrote 0 pages
      2022-07-02T00:01:43.739+00:00: index-15--2947465233479438924.wt: 23756MB (24910442496) available space in the first 80% of the file
      2022-07-02T00:01:43.739+00:00: index-15--2947465233479438924.wt: 30333MB (31806828544) available space in the first 90% of the file
      2022-07-02T00:01:43.739+00:00: index-15--2947465233479438924.wt: require 10% or 16058MB (16838866124) in the first 90% of the file to perform compaction, compaction proceeding
      2022-07-02T00:01:44.173+00:00:  compacting index-15--2947465233479438924.wt for 574 seconds; reviewed 0 pages, skipped 0 pages, rewritten 0pages
      2022-07-02T00:01:44.551+00:00:  compacting index-15--2947465233479438924.wt for 574 seconds; reviewed 1 pages, skipped 1 pages, rewritten 0pages
      2022-07-02T00:01:44.963+00:00:  compacting index-15--2947465233479438924.wt for 575 seconds; reviewed 1 pages, skipped 1 pages, rewritten 0pages
      2022-07-02T00:01:45.445+00:00:  compacting index-15--2947465233479438924.wt for 575 seconds; reviewed 2 pages, skipped 2 pages, rewritten 0pages
      2022-07-02T00:01:45.861+00:00:  compacting index-15--2947465233479438924.wt for 576 seconds; reviewed 3 pages, skipped 3 pages, rewritten 0pages
      2022-07-02T00:01:46.266+00:00:  compacting index-15--2947465233479438924.wt for 576 seconds; reviewed 6 pages, skipped 6 pages, rewritten 0pages
      2022-07-02T00:01:46.635+00:00:  compacting index-15--2947465233479438924.wt for 577 seconds; reviewed 6 pages, skipped 6 pages, rewritten 0pages
      2022-07-02T00:01:46.995+00:00:  compacting index-15--2947465233479438924.wt for 577 seconds; reviewed 8 pages, skipped 8 pages, rewritten 0pages
      2022-07-02T00:01:47.365+00:00:  compacting index-15--2947465233479438924.wt for 577 seconds; reviewed 9 pages, skipped 9 pages, rewritten 0pages
      

      After deciding to proceed with the compact, we almost immediately start outputting multiple progress messages in rapid succession.

      These messages are generated by __wt_block_compact_progress(). The problem is that way the function is written if it falls behind, it will print messages every time it is called until it "catches up". 

      __wt_epoch(session, &cur_time);
      
      /* Log one progress message every twenty seconds. */
      time_diff = WT_TIMEDIFF_SEC(cur_time, session->compact->begin);
      if (time_diff / WT_PROGRESS_MSG_PERIOD > *msg_countp) {
          ++*msg_countp;
          __wt_verbose_debug(session, WT_VERB_COMPACT_PROGRESS,
            " compacting %s for %" PRIu64 " seconds; reviewed %" PRIu64 " pages, skipped %" PRIu64
            " pages, rewritten %" PRIu64 "pages",
            block->name, time_diff, block->compact_pages_reviewed, block->compact_pages_skipped,
            block->compact_pages_rewritten);
      } 

      The problem in the example cited above is that when we first call __wt_block_compact_progress() over 500 seconds have already elapsed (for reasons I don't understand), but msg_countp is apparently very small, so on every call we print a message.

            Assignee:
            sean.watt@mongodb.com Sean Watt
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: