[SERVER-45056] Add ReplSetTest and ShardingTest control tests sys-perf workload to detect future performance regressions Created: 11/Dec/19  Updated: 06/Dec/22  Resolved: 05/Nov/21

Status: Closed
Project: Core Server
Component/s: Performance, Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: former-quick-wins, perf-qwin, quick-win
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Sprint: STM 2021-01-25
Participants:
Story Points: 3

 Description   

As part of PM-1360, ReplSetTest and ShardingTest control tests were created to measure the scalability and performance of the setup and teardown procedures of those test fixtures. To detect future performance changes in ReplSetTest and ShardingTest, we should add a perf workload that measures the total execution time of these tests. For completeness, it would be nice to include the following tests:

This will help us detect changes to single node startup performance and scalability of the test fixtures with many nodes.



 Comments   
Comment by Judah Schvimer [ 05/Nov/21 ]

Closing in favor of EVG-268 as described in previous comments.

Comment by Judah Schvimer [ 09/Aug/21 ]

max.hirschhorn pointed out to me that EVG-268 would be a potentially more robust and easier way to track this. One concern would be that job parallelism could impact the numbers by stealing resources from the control tests. We may want to make the control tests their own task with only 1 job to mitigate this.

Comment by Ryan Timmons [ 21/Jan/21 ]

Sounds good. Thanks, Judah.

I'm going to assign to the repl backlog user to coordinate doing this in a quick-win or similar.

STM is happy to assist if there is any ambiguity about how to use DSI and other tooling to get this done.

Comment by Judah Schvimer [ 20/Jan/21 ]

Thanks ryan.timmons for looking into this! This is a significant developer productivity win across the board and in theory makes patch builds faster as well. As such, I think whatever team most effectively and efficiently could do the work would be appropriate. Replication originally created the tests, but I don't think has any special expertise for moving them to DSI, and is not the only team to benefit from this work. That said, if this type of work is intended to be self-service, we could make it a repl quick win to do at some point in the future. I think it will fall below other work the product team is interested in though.

Comment by Ryan Timmons [ 19/Jan/21 ]

I'd advocate to put this in DSI given given it already provides rich infrastructure for doing exactly the kinds of perf regression testing we're aiming for here.

To do this, you'd create a new test_control.yml file (in the DSI repo) that runs a js test in the mongo repo. You could use the bestbuy_query test_control as a starting point. Instead of running run_workloads.py this test_control would just run the mongo shell:

~/bin/mongo ${mongodb_setup.meta.shell_ssl_options} --host "${mongodb_setup.meta.hostname}" --port="${mongodb_setup.meta.port}" ./src/mongo/jstests/ReplSetTest-PerfTest.js

The js runner, ReplSetTest-PerfTest.js in this example, would output whatever performance numbers the team thinks are useful. We already include the jstests directory in the mongo artifact that we download to all machines as a part of sys-perf and microbenchmarks workloads.

In general our JS testing can be rather chatty, so you'd need to write a regex or something that determines which stdout lines indicate the desired performance numbers to collect. If you want to ride existing rails, the script could just output lines prefixed with greater-than signs and use the existing mongoshell output parser. The format is self-explanatory from that code.

I don't think new DSI functionality is needed here. As such, I think this is self-service and I'd like to encourage Repl (er..core server) to take this on as a new sys-perf or microbenchmarks task.

Just re-read Rob's comment RE potential for these tests to be noisy. That is a known/expected thing especially for new workloads. We would provide some guidance in doing some tuning to get the test as noise-free as possible. You might be surprised how noisy the test actually isn't. But even if that's not possible we can let build-barons know to not triage the test and thus it would be purely informative.

Alternatively we could make the task only run when manually scheduled (or perhaps very infrequently). TLDR: there are lots of good reasons to put this into sys-perf or microbenchmarks and very few reasons not to, all of which have known solutions

In the interest of not ping-ponging this ticket around, let's chat if we decide to re-assign back to dev-prod. If there are gaps in DSI's offerings I'd like to capture those discretely so these kinds of workloads are easy to write, run, and debug self-service.

judah.schvimer what is the right team in your estimation to carry this forward?

Comment by Ryan Timmons [ 29/Sep/20 ]

FWIW it would probably be pretty straightforward and self-service to add a workload using the mongo-perf workloads. These workloads run arbitrary shell code.

They currently only run on microbenchmarks machines, but adding them to sys-perf to run on AWS machines is also straightforward and self-service. We could help with this portion of the work.

I'd like to encourage these tests to be written by the users who request it since they have a better understanding of exactly what kinds of performance regressions they want to prevent.

Comment by Judah Schvimer [ 24/Sep/20 ]

I'm fine not automating it at first. I'm just nervous to expect the replication and sharding teams to periodically take a look. Having the information though to look at, would still be a huge win.

Comment by Robert Guo (Inactive) [ 23/Sep/20 ]

We do get perf trend charts to see changes visually. I didn't plan on integrating signal processing since it's mostly used for user-facing workloads and not more internal ones like Google Benchmarks.

I'm a bit hesitant to run more rigorous analysis on these JS ad-hoc perf tests because there's likely going to be higher noise/signal ratio and false-positive change points (which is why we have dedicated perf clusters); but if you'd like automated analysis, I'm happy to sit down with the DAG and BB teams after workloads from this ticket have been running for a while to see what we can automate. The alternative is manual analysis where the test owners would go look at the results every few weeks, like what we do for gbench.

Comment by Judah Schvimer [ 23/Sep/20 ]

If this is running in the correctness project, will we be able to run signal processing on it to see when the perf changes?

Comment by Brooke Miller [ 22/Sep/20 ]

In triaging, robert.guo discussed that we should write a shell helper to generate a perf report from a shell test, and a resmoke mechanism to combine the perf reports. The test will run in the correctness project since the correctness test fixtures don't require a single machine that will not work in sys-perf.

Comment by Judah Schvimer [ 14/Jan/20 ]

We should also add control tests for RollbackTest since we've worked to speed that up as well.

Comment by William Schultz (Inactive) [ 11/Dec/19 ]

judah.schvimer Not really, it just might mean we would encounter more significant perf fluctuations after we add the perf workload. I'm sure we could deal with them or ignore them appropriately, though. We could also do the work to create the workload but disable it until the work from PM-1360 is done, I suppose.

Comment by Judah Schvimer [ 11/Dec/19 ]

william.schultz, would it be a problem if this were done concurrently with PM-1360?

Comment by William Schultz (Inactive) [ 11/Dec/19 ]

The thought is that this can be completed separately from PM-1360, after that project's performance improvements to ReplSetTest and ShardingTest are finished.

Generated at Thu Feb 08 05:07:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.