Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4882

Improve checkpoint performance when there are large metadata pages

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT3.2.1, 4.3.1, 4.2.0-rc3, 4.0.13
    • Component/s: None
    • Labels:
      None
    • Sprint:
      Storage Engines 2019-07-01
    • Backport Requested:
      v4.2, v4.0

      Description

      We noticed that having large pages in the metadata can make checkpoints complete very slowly. We also don't try to do update-restore eviction of metadata pages - which limits how often we'll be able to successfully evict metadata pages, and could lead to performance issues.

      A wtperf workload that demonstrates the slow checkpoint behavior is:

      $ cat bench/wtperf/runners/metadata-split-test.wtperf
      # Create a set of tables with uneven distribution of data
      conn_config="cache_size=1G,eviction=(threads_max=8),file_manager=(close_idle_time=100000),checkpoint=(wait=2000,log_size=2GB),statistics_log=(wait=1,json,on_close),session_max=1000"
      table_config="type=file,app_metadata=\"this_is_a_fairly_long_string_to_cause_splits_in_metadata_more_often_abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyabcdefghijklmnopqrstuvwxyzzzzzzzz\""
      table_count=2000
      icount=0
      random_range=1000000000
      pareto=10
      range_partition=true
      report_interval=5
       
      run_ops=0
      populate_threads=0
      icount=0
      

      It should be relatively straight forward to translate that into a Python test case, though the issue is that the checkpoint on close is taking a long time - and defining long time in a Python test is traditionally difficult to get robust in automated testing. We'd need to look for a different signal that there was a problem with the behavior.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                michael.cahill Michael Cahill
                Reporter:
                alexander.gorrod Alexander Gorrod
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: