[SERVER-78603] Introduce a resmoke option to allow benchmarks to be run in parallel Created: 03/Jul/23  Updated: 29/Oct/23  Resolved: 11/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Alex Neben
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Server Development Platform
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Currently, resmoke.py doesn't allow benchmarks to be run concurrently (-j must be 1) and rightfully so. However, there are cases where I want to confirm locally that some changes that I have made didn't break them (as in, didn't cause them to crash or fail in some unexpected way, as opposed to introduction of some performance regression).

It would be nice to have a target called something like "smoke-benchmarks" or something like that which allows me to specify -j$(nproc --all) with which I can smoke test that they all complete without errors (just like unittests).



 Comments   
Comment by Githook User [ 11/Jul/23 ]

Author:

{'name': 'Alex Neben', 'email': 'alexander.neben@mongodb.com', 'username': 'IamXander'}

Message: SERVER-78603 Remove single core limit on benchmarks
Branch: master
https://github.com/mongodb/mongo/commit/3a1d2d8b22eff8b0f4fa708cd85e977329cff0de

Comment by Alex Neben [ 11/Jul/23 ]

As mentioned in slack

I have put up a PR that removes the limit of -j=1. I have also included steps to run benchmarks and what the expected output should be. We believe that these benchmarks and not independent and running more than one at the same time causes them to fail.

Comment by Kaloian Manassiev [ 10/Jul/23 ]

What is PB?
Also, anything that's "close to an hour" will not satisfy my requirement.
My requirement is: I make some SConscript linking change or change some fundamental library and I want to very quickly, using all the resources of my virtual workstation (meaning all its 8 CPUs) to be able to validate that I didn't break the benchmarks.

I really don't get it why can't you just allow -j to work for benchmarks for values greater than 1. I already can do something dumb like run benchmarks on a debug build and it emits a warning, so it's the same logic for -j8.

Comment by Steve Gross [ 10/Jul/23 ]

kaloian.manassiev@mongodb.com: if we speed up the benchmark wallclock execution time in PB (say, to less than an hour), will that meet your needs? IIUC, that will let you verify reasonably quickly that your change does(n't) break benchmarks. WDYT?

(Note: It's not theoretical; we are actively exploring ways to speed up benchmark testing in PB).

Comment by Daniel Moody [ 05/Jul/23 ]

I think there are two options we can do moving forward to help this situation.

1. Create suites which could be used like "--suites=concurrent_benchmarks,serial_benchmarks" to allow some of the suites that can safely run in parallel to do so, and contain others which must be run serially.
2. Determine an acceptable and valid minimal --benchmark_min_time value to use, and add it as an option to resmoke or combined in to some specific resmoke benchmark suite config whose goal is speed (such as the proposed suites from 1).

Comment by Jason Chan [ 05/Jul/23 ]

Assigning back to SDP. Talking to daniel.moody@mongodb.com, we think a solution could be to utilize the --benchmark_min_time option to drastically shrink the run-time of of the benchmarks, and then have a concurrent test suite that runs them in parallel. There may be difficulties in the filesystem benchmarks which may require them to be run in a serial suite instead.

Comment by Kaloian Manassiev [ 03/Jul/23 ]

daniel.moody@mongodb.com, this doesn't help me. What I need to test (and I have introduced a number of BFs recently) is that a change I make in SConscript for example or in some central decoration initialiser, doesn't cause any of the benchmarks to crash. This is as opposed to ensuring that the performance is the same.

All I care about is that we remove the restriction of --j=1 for benchmarks. The rest of the tools and evergreen can continue running with 1.

Does that make sense?

Comment by Daniel Moody [ 03/Jul/23 ]

kaloian.manassiev@mongodb.com we have a high level target "prove-benchmarks" which will only run benchmarks which have incurred changes from that last time it was run, so essentially it will only rerun benchmarks which are possibly affected, even such as changing some include header. Rerunning with no changes, will quickly ("scons" quickly, so like 30-45 seconds) reprint the last test results (pass or fail).

This does mean there is an upfront cost to run all the tests, but then after that, iterations should be faster and only run a small set of affected benchmarks (unless you working on some core header which affects everything). Is this helpful for your situation?

Comment by Billy Donahue [ 03/Jul/23 ]

The Google Benchmarks API doesn't really offer a "check for correctness" mode of operation, at least not directly.
The benchmark API reserves wide latitude on how much time the run will consume.

The is a --benchmark_min_time, documented directly in the code.
https://github.com/10gen/mongo/blob/244f33b7433c74c470855864f3ae6b167decbd89/src/third_party/benchmark/dist/src/benchmark.cc#L62-L67

What the documentation doesn't mention about this flag is that it also informs the benchmark maximum time.

https://github.com/google/benchmark/blob/main/docs/user_guide.md#runtime-and-reporting-considerations

In all cases, the number of iterations for which the benchmark is run is governed by the amount of time the benchmark takes. Concretely, the number of iterations is at least one, not more than 1e9, until CPU time is greater than the minimum time, or the wallclock time is 5x minimum time. The minimum time is set per benchmark by calling MinTime on the registered benchmark object.

This MinTime defaults to the --benchmark_min_time given on the command line for the benchmark. The default value is 0.5 seconds, btw.
So I think there's a path for resmoke to try running benchmarks with this flag set to an "epsilon" value to produce "correctness" executions, and then maybe those can be parallelized by tweaking resmoke.py.

So I think this is all about the command lines that resmoke uses to execute benchmarks, and wouldn't involve Service Architecture.
If the CLI doesn't actually work this way, I guess we'd have to take another look, though.

Comment by Daniel Moody [ 03/Jul/23 ]

jason.chan@mongodb.com even if we are talking about correctness, does that imply someone (or service arch) will have increased maintenance overhead for the extra mode for these tests? Lets say we do add "concurrent correctness mode", if the benchmarks fail because they are running parallel, what does that imply or mean? How does one differentiate that failure?

I am not opposed to the idea, but I am tying to understand exactly how much value is here.

Comment by Jason Chan [ 03/Jul/23 ]

I can assign this to Service Arch to triage to answer questions around whether we think there's anything interesting in regards to parallelizing the workloads, but my understanding of the request here is that it's mostly about completing the workloads without failure, as opposed to measuring actual performance. We can discuss the ownership of this once I gather feedback on daniel.moody@mongodb.com's questions.

I'd be interested to get SDP's opinions as well on whether there are alternatives we haven't explored here around getting a better signal so developers can better catch things like this locally.

Comment by Daniel Moody [ 03/Jul/23 ]

The question is why is it intentionally serial in the first place? Are the benchmark tests sensitive to running in parallel? They seem to be measuring performance, if they are running concurrently will that impact the results?

Yes the implementation of running some test concurrently is in resmoke, and other test list based tests (like unittests) already do this, but is that what we really want to do for these benchmark tests?

Comment by Steve Gross [ 03/Jul/23 ]

daniel.moody@mongodb.com, can you weigh in here? IIRC, the discussion centered on the desirability of parallelization in the first place...?

Comment by Steve Gross [ 03/Jul/23 ]

blake.oler@mongodb.com jason.chan@mongodb.com Per SDP discussion, it sounds like this would be owned by service arch. Can y'all advise?

Generated at Thu Feb 08 06:38:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.