[SERVER-43804] Establish baseline performance metrics for startup, shutdown, initiation portions of ReplSetTest and ShardingTest Created: 03/Oct/19 Updated: 20/Oct/19 Resolved: 17/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | Repl 2019-10-21 | ||||||||
| Participants: | |||||||||
| Description |
|
To measure the success of the Faster Local Testing project, we should, before making any changes, measure the execution times of the startup, shutdown, and initiation phases of ReplSetTest and ShardingTest across all tests in the replica_sets and sharding suites. We should record the durations of the relevant sections of code for each test so that we can run this experiment again after the project is complete to see how the performance improved for each relevant section i.e. ReplSetTest.startSet, ShardingTest.stop, etc. |
| Comments |
| Comment by William Schultz (Inactive) [ 20/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After some more investigation, I think that the distinct profile of stopSet performance can be explained by the fact that stopSet execution time includes the time we take to validate each replica set node. If validation is one of the slowest parts of stopSet, then the amount of data written during the test may affect its speed much more than the size of the replica set. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 17/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
After some discussion about these metrics, there are a few other follow up experiments that might be interesting to run in the future. Adding them here for reference:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 17/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
judah.schvimer I turned off the icecream daemon (sudo systemctl stop iceccd.service) before running the benchmarks on my local machine. There may be other factors that are affecting the performance in some way, but I will leave that to a follow up investigation. I do not want to deal with the issues of uploading potentially private credentials to a public repo, so I am attaching a zip directory that contains a snapshot of the scripts I used: faster-local-testing-snapshot.zip
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jason Chan [ 15/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The above report LGTM. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 14/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jason.chan If you want to review the above report and see if it is all clear to you that would be great. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by William Schultz (Inactive) [ 14/Oct/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Performance BaselineThe attached charts display data for setup and teardown performance across different Evergreen variants and on my local workstation, for ReplSetTest and ShardingTest. These results are based on runs from this Evergreen version and this patch build. The full data sets used to generate these charts are also attached: replica_sets.csv Each individual plot is a histogram where the x-axis is the duration of that portion of the test fixture in milliseconds, and the y-axis is the number of tests whose relevant metric fell into that execution time bucket. The upper limit of the x-axis is bounded to 10000ms to remove any significant outliers. To help understand the data further, each histogram bar is colored to indicate the number of nodes (or shards) that were used in the test that the data point came from. For example, we can see that for ReplSetTest.startSet, there are tight groupings corresponding to some of the most common replica set sizes used in tests i.e. 1, 2, and 3 nodes. The startup times appear to scale roughly linearly in the number of nodes. The ReplSetTest.stopSet histogram profile is more spread out. I'm not sure exactly why this is, but maybe for certain tests that write more data they take longer to checkpoint that data when shutting down, and so the shutdown time is dependent on the amount of data written. For startup, all nodes should nearly always be starting with a clean data directory, so presumably there is less variability in those execution times. We can see that the histogram profiles are quite consistent across different Evergreen variants. It is worth pointing out that this Evergreen data is compiled from test runs with a reasonably high level of parallelization i.e. multiple tests running on the same machine at once. This doesn't give an entirely accurate view of single test latency, but my hope is that it's a reasonable proxy. If a machine is overloaded, then I would expect startup/shutdown times to only be slower than they are on an unloaded machine. So, if anything, these numbers should act as an upper bound for local test execution times. For completeness, I also ran one replica_sets suite on a RHEL 6.2 spawn host with no parallelization i.e. 1 resmoke job, notated as "enterprise-rhel-62-64-bit-sequential". Even with a single job the histogram profile isn't dramatically different, so I think it's OK to use the metrics from parallelized suites as a reasonable approximation of single test latencies. The histogram profile when running on my local Ubuntu workstation is a noticeable anomaly (see the "ubuntu_1610_local_workstation" subplots). Its median across all plots is much slower and the profile is much more spread out. My hypothesis is that this may have to do with a slower disk on my local machine than those in AWS. To test this hypothesis, I also ran the suites locally on my machine using a ramdisk for the data directory. This provides a significant performance improvement and brings the performance profile more in line with those from the Evergreen runs. I did not attempt to run a full sharding suite on my local workstation because it takes a long time to run. In the sharding data, for which only two Evergreen variants are shown, we see the high amount of 3 node replica set config servers. Note that the "count" value for the startupAndInitiate metric counts the number of shards in that test, not the number of nodes in all shards. There are still some aspects of this data that are worth investigating further (for example, why the data consistency checks profile seems to differ between local runs and Evergreen runs), but I think this data acts as a good baseline, and we can continue to analyze these metrics as we speed up different parts of the fixtures. Below are summary statistics for the recorded metrics that are shown in the charts: replica_sets
sharding
|