[WT-2868] Add sample_interval to checkpoint-stress wtperf config Created: 29/Aug/16  Updated: 12/Oct/17  Resolved: 06/Sep/16

Status: Closed
Project: WiredTiger
Component/s: None
Affects Version/s: None
Fix Version/s: WT2.9.0, 3.3.12, 3.2.10

Type: Bug Priority: Major - P3
Reporter: Sulabh Mahajan Assignee: Sulabh Mahajan
Resolution: Fixed Votes: 0
Labels: None


 Description   

Add sample_interval to generate monitor file for gathering throughput and latency stats



 Comments   
Comment by Githook User [ 29/Aug/16 ]

Author:

{u'username': u'sulabhM', u'name': u'Sulabh Mahajan', u'email': u'sulabh.mahajan@mongodb.com'}

Message: WT-2868 Add sample_interval to checkpoint-stress.wtperf (#2989)
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/f07c61492f69a45d566980a31b46f0044093bb37

Comment by Sulabh Mahajan [ 29/Aug/16 ]

Sue LoVerso I have updated Jenkins job perf-long to run checkpoint-stress.wtperf and generate graphs on throughput, number of updates and checkpoint count. Though I ran my changes separately, it is yet to execute a full run.

Please take a look and let me know if you have any suggestions on improving this change to the jenkins job.

Comment by Githook User [ 29/Aug/16 ]

Author:

{u'username': u'sulabhM', u'name': u'Sulabh Mahajan', u'email': u'sulabh.mahajan@mongodb.com'}

Message: WT-2868 Add sample_interval to checkpoint-stress.wtperf (#2989)
Branch: mongodb-3.4
https://github.com/wiredtiger/wiredtiger/commit/f07c61492f69a45d566980a31b46f0044093bb37

Comment by Githook User [ 29/Aug/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger: 7d3c0f9f50862798270cf38663255202e5bcf3fd from branch mongodb-3.4

ref: 2566118fc6..7d3c0f9f50
for: 3.3.12

WT-2865 eviction thread error failure
WT-2868 Add sample_interval to checkpoint-stress wtperf config
WT-2869 Performance regression on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/67833ec4180ecc3057f34cb243e2006d4ee4c56d

Comment by Sue LoVerso [ 29/Aug/16 ]

Sulabh Mahajan The run completed. I have a couple suggestions that I'll let you fix so that you learn those parts of Jenkins:

  • Overall I have reservations about including the checkpoint-stress numbers in the collective latency calculations. The other 4 tests are all related to each other in configuration, duration and data and run for a couple hours. This new test is unrelated to those. However, I'm a bit on the fence because adding a lot of new plots gets confusing too. I am okay with it individually added to the Max Latency chart as long as it is in the same ballpark, but it may skew the "total number of warnings" chart. (However, it isn't doing that yet - see next bullet.)
  • You need to add a setting max_latency=2000 to checkpoint-stress.wtperf in order to get any latency warning messages. So, for now it isn't contributing to that value. While you're in there please move the sample* lines so they're alphabetized. Thanks.
  • In Jenkins, you need to add a label to all your data. In each plot definition, when you pick "Load data from properties file" it will bring up a box labelled "Data series legend label" for you to type in what the data is. This is particularly important in any plot with more than one data item such as the Max Latency plot.
  • In Jenkins, when it displays the plots (when you select the "Plots" link to see the charts) it displays them alphabetically by "Plot title". That is why I named the others "Test1" - "Test4". But again, those 4 tests are related to each other, using the same data initially created in Test1, the populate phase. You don't need to call yours "Test5" and that might be misleading since it isn't related to the other four. However, I don't have a good suggestion because if you want it at the bottom after the other 4, then anything I come up with inserts it in the beginning or middle.
  • In order to see the plots on http://source.wiredtiger.com/jenkins/plots/ you need to add entries for them in the wiredtiger/jenkins/plots/index.mh file. (I.e. that is the repo: https://github.com/wiredtiger/wiredtiger.github.com). They're simply numbered so add another line for each additional plot you add. Then run build.sh to generate the HTML. That requires pandoc so if you don't have that on your system just let me know and I can add the plots there.
  • I did fix a typo in the property filenames in the checkpoint stress min/max throughput plot. But there is no data in the csv file yet because of the typo.
  • You could eliminate one step in getting the update min/max throughput numbers by doing cut -d ' ' -f 5 instead of the two cut commands you're using.

But thanks for adding this!

Comment by Sulabh Mahajan [ 30/Aug/16 ]

Sue LoVerso Following addresses your comments:

  • I discussed with Alex, and we are of the opinion that we do not need latency measurements for the checkpoint-stress test. So I removed all latency related changes
  • I will fix alphabetising sample* files in the wtperf file as a separate change
  • Added legend label to graphs that were missing it
  • Renamed from Test5 to Test_Checkpoint_Stress, this keeps these graphs appear at the bottom
  • I have added the graph to the jenkins/plots/index.mh file in the wiredtiger.github.com repo
  • Shortened to cut -d ' ' -f 5 to get min/max throughput

Thanks for helping me out with this one.

Comment by Sue LoVerso [ 30/Aug/16 ]

I discussed with Alex, and we are of the opinion that we do not need latency measurements for the checkpoint-stress test. So I removed all latency related changes

I'm going to push back and ask both you and Alexander Gorrod what the point of adding the sample* lines is if you're not measuring nor plotting any latency? The current new plots are 1. number of checkpoints, 2. update counts, 3. min/max throughput. None of that requires the monitor thread that records latencies into the monitor file. The other thing the monitor thread can measure and warn about is a minimum throughput, but we don't use that anywhere right now.

I really like having number of checkpoints plot. I think it is telling that we're only completing 1 checkpoint in 10 minutes.

So I'll put these additional comments out there:

  • If we don't want latency measurements, then you can remove the sample lines. I can agree that the long test1-4 measures that sort of thing already and does checkpoints once per minute as well. It does not have an update-only workload though.
  • However, I'll point out that long latencies have been more common around the syncs of checkpoints. So it can be interesting.
  • Are we trying to get close to how MongoDB uses WT? If so, I have more suggestions on the config file:
    • Add 4 eviction threads to the connection config.
    • Increase cache size to 16Gb (half of the memory of the AWS perf machine we use).
    • For table config, use leaf_page_max=16k,memory_page_max=10M.
    • Consider turning on fast, json statistics.
  • If we're not trying to resemble MongoDB usage, then a comment for the values in the config file would be helpful to know why/how those numbers were chosen.
  • Thought of this typing the above - Why not have this test be another related test to the other 500m tests as an update-only version? Then it does fit in with the other plots and already has the MongoDB-related setup. Its run-time would have to increase as well. I'm kind of liking this idea. If you do this we should figure out how to best incorporate the number of checkpoints information you added (i.e. just for this test, or sum from all tests, and how to know what a good/bad number is, etc). What do you think?
Comment by Sue LoVerso [ 30/Aug/16 ]

FYI I hand-edited the csv files so that the plots are consistently labeled with your new labels and they look like you'd expect.

Comment by Sulabh Mahajan [ 31/Aug/16 ]

Sue Loverso thanks for the input, I will work on this and get back to you.

Comment by Sulabh Mahajan [ 06/Sep/16 ]

Sue Loverso, the changes to the wtperf file are under review::

  • We decided to keep the sample* lines for any use in the future. I can remove them if you feel otherwise.
  • We are not necessarily imitating how MongoDB uses WT, I have added eviction threads and increased cache size.
  • Since this test is partly intended for the performance measurements, I am inclined to not turn on statistics till needed. Let me know if you feel otherwise
  • This wtperf configuration is mostly based on the test that was used for WT-2389, to keep track of any performance regression in no of updates with stressed checkpoints.
    I am not sure if this goes well with any other tests. For now I am inclined towards not merging this with any other test.
Comment by Githook User [ 12/Sep/16 ]

Author:

{u'username': u'sulabhM', u'name': u'Sulabh Mahajan', u'email': u'sulabh.mahajan@mongodb.com'}

Message: WT-2868 Add sample_interval to checkpoint-stress.wtperf (#2989)
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/f07c61492f69a45d566980a31b46f0044093bb37

Comment by Githook User [ 13/Sep/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Import wiredtiger: 911c940adab547d36ac305fc627a79e637fa3c40 from branch mongodb-3.2

ref: dddca65..911c940ada
for: 3.2.10

SERVER-24971 Excessive memory held by sessions when application threads do evictions
SERVER-25843 Coverity analysis defect 99856: Redundant test
SERVER-25845 Coverity analysis defect 99859: Explicit null dereferenced
SERVER-25846 Coverity analysis defect 99861: Dereference after null check
WT-1162 Add latency to Jenkins wtperf tests and plots
WT-2026 Maximum pages size at eviction too large
WT-2221 Document which statistics are available via a "fast" configuration vs. an "all" configuration
WT-2233 Investigate changing when the eviction server switches to aggressive mode.
WT-2239 Make sure LSM cursors read up to date dsk_gen, it was racing with compact
WT-2323 Allocate a transaction id at the beginning of join cursor iteration
WT-2353 Failure to create async threads as part of a wiredtiger_open call will cause a hang
WT-2380 Make scripts fail if code doesn't match style
WT-2486 Update make check so that it runs faster
WT-2555 make format run on Windows
WT-2578 remove write barriers from the TAILQ_INSERT_XXX macros
WT-2631 nullptr is passed for parameters marked with attribute non-null
WT-2638 ftruncate may not be supported
WT-2645 wt dump: push the complexity of collecting metadata into a dump cursor
WT-2648 cache-line alignment for new ports
WT-2665 Limit allocator fragmentation in WiredTiger
WT-2678 The metadata should not imply that an empty value is true
WT-2688 configure --enable-python doesn't check for availability of swig
WT-2693 Check open_cursor error paths for consistent handling
WT-2695 Integrate s390x accelerated crc32c support
WT-2708 split child-update race with reconciliation/eviction
WT-2711 Change statistics log configuration options
WT-2719 add fuzz testing for WiredTiger options and reconfiguration.
WT-2728 Don't re-read log file headers during log_flush
WT-2729 Focus eviction walks in largest trees
WT-2730 cursor next/prev can return the wrong key/value pair when crossing a page boundary
WT-2731 Raw compression can create pages that are larger than expected
WT-2732 Coverity analysis defect 99665: Redundant test
WT-2734 Improve documentation of eviction behavior
WT-2737 Scrub dirty pages rather than evicting them
WT-2738 Remove the ability to change the default checkpoint name
WT-2739 pluggable file systems documentation cleanups
WT-2743 Thread count statistics always report 0
WT-2744 partial line even with line buffering set
WT-2746 track checkpoint I/O separately from eviction I/O
WT-2751 column-store statistics incorrectly calculates the number of entries
WT-2752 Fixes to zipfian wtperf workload config
WT-2755 flexelint configuration treats size_t as 4B type
WT-2756 Upgrade the autoconf archive package to check for swig 3.0
WT-2757 Column tables behave differently when column names are provided
WT-2759 Releasing the hot-backup lock doesn't require the schema lock.
WT-2760 Fix a bug in backup related to directory sync. Change the filesystem API to make durable the default
WT-2762 wtstats tool fails if checkpoint runs
WT-2763 Unit test test_intpack failing on OSX
WT-2764 Optimize checkpoints to reduce throughput disruption
WT-2765 wt dump: indices need to be shown in the dump output
WT-2766 Don't count eviction of lookaside file pages for the purpose of checking stuck cache
WT-2767 test suite needs way to run an individual scenario
WT-2769 Update documentation to reflect correct limits of memory_page_max
WT-2770 Add statistics tracking schema operations
WT-2772 Investigate log performance testing weirdness
WT-2773 search_near in indexes does not find exact matches
WT-2774 minor cleanups/improvements
WT-2778 Python test suite: make scenario initialization consistent
WT-2779 Raw compression created unexpectedly large pages on disk
WT-2781 Enhance bulk cursor option with an option to return immediately on contention
WT-2782 Missing a fs_directory_list_free in ex_file_system.c
WT-2783 wtperf multi-btree.wtperf dumps core on Mac
WT-2785 Scrub dirty pages rather than evicting them: single-page reconciliation
WT-2787 Include src/include/wiredtiger_ext.h is problematic
WT-2788 Java: freed memory overwrite during handle close can cause JNI crash
WT-2791 Enhance OS X Evergreen unit test
WT-2793 wtperf config improvements
WT-2795 Update documentation around read-only configuration
WT-2796 Memory leak in reconciliation uncovered by stress testing
WT-2798 Crash vulnerability with nojournal after create during checkpoint
WT-2800 Illegal file format in test/format on PPC
WT-2801 Crash vulnerability from eviction of metadata during checkpoint
WT-2802 Transaction commit causes heap-use-after free
WT-2803 Add verbose functionality to WT Evergreen tests
WT-2804 Don't read values in a tree without a snapshot
WT-2805 Infinite recursion if error streams fail
WT-2806 wtperf allocation size off-by-one
WT-2807 Switch Jenkins performance tests to tcmalloc
WT-2811 Reconciliation asserts that transaction time has gone backwards
WT-2812 Error when reconfiguring cache targets
WT-2813 small cache usage stuck even with large cache
WT-2814 Enhance wtperf to support single-op truncate mode
WT-2816 Improve WiredTiger eviction performance
WT-2817 Investigate performance regression in develop, add workload to wtperf/runners
WT-2818 The page visibility check when queuing pages for eviction is overly restrictive
WT-2820 add gcc warn_unused_result attribute
WT-2822 panic mutex and other functions that cannot fail
WT-2823 support file handles without a truncate method
WT-2824 wtperf displays connection and table create configurations twice
WT-2826 clang38 false positive on uninitialized variable.
WT-2827 checkpoint log_size configuration improvements
WT-2828 Make long wtperf tests reflect mongoDB usage
WT-2829 Switch automated testing to use enable-strict configure option
WT-2832 Python test uses hard-coded temporary directory
WT-2834 Join cursor: discrepancy with bloom filters
WT-2835 WT_CONNECTION.leak-memory can skip memory map and cache cleanup
WT-2838 Don't free session handles on close if leak memory is configured
WT-2839 lint: Ignoring return value of function
WT-2840 clang analysis: garbage values
WT-2841 Jenkins Valgrind runner is reporting errors in test wt2719_reconfig
WT-2842 split wtperf's configuration into per-database and per-run parts
WT-2843 Fix a bug in recovery if there is no filesystem truncate support
WT-2846 Several bugs related to reconfiguring eviction server at runtime
WT-2847 Merge fair locks into read/write locks.
WT-2850 clang 4.1 attribute warnings when building
WT-2853 Multi threaded reader writer example shows temporary slowdown or lockup
WT-2857 POSIX ftruncate calls should be #ifdef'd HAVE_FTRUNCATE
WT-2862 Fix lint error in test case for forced eviction with multiple cursors
WT-2863 Support UTF-8 paths on Windows
WT-2865 eviction thread error failure
WT-2866 Eviction server algorithm tuning
WT-2867 Review and fix barrier usage in __lsm_tree_close
WT-2868 Add sample_interval to checkpoint-stress wtperf config
WT-2869 Performance regression on secondaries
WT-2870 Rename wtperf checkpoint schema jobs
WT-2871 __wt_verbose has the wrong GCC format attributes
WT-2872 Recent stuck cache test/stress failures.
WT-2873 Refactor CRC32 code
WT-2875 Test test_wt2853_perf can run too long under valgrind
WT-2876 Extend wtperf to support a log like table
WT-2878 Verbose changes affected performance
WT-2881 Add -Wpedantic to clang compiler warning flags
WT-2883 wiredtiger_open with verbose=handleops recursive loop
WT-2885 __wt_checkpoint_signal lint
WT-2886 Decide how in-memory configuration and eviction_dirty_target interact
WT-2888 Switch functions to return void where possible
WT-2892 hot backup can race with block truncate
WT-2896 Coverity #1362535: resource leak
WT-2897 Checkpoints can become corrupted on failure
WT-2901 Add option to disable checkpoint dirty stepdown phase
WT-2903 Reduce the impact of checkpoint scrubbing on applications
Branch: v3.2
https://github.com/mongodb/mongo/commit/7d2acd6395ec84beca34718a75371bc11f0c9f60

Generated at Sun Nov 19 12:37:36 UTC 2017 using JIRA 7.2.10#72012-sha1:2651463a07e52d81c0fcf01da710ca333fcb42bc.