[SERVER-78603] Introduce a resmoke option to allow benchmarks to be run in parallel Created: 03/Jul/23 Updated: 29/Oct/23 Resolved: 11/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Alex Neben |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Server Development Platform
|
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
Currently, resmoke.py doesn't allow benchmarks to be run concurrently (-j must be 1) and rightfully so. However, there are cases where I want to confirm locally that some changes that I have made didn't break them (as in, didn't cause them to crash or fail in some unexpected way, as opposed to introduction of some performance regression). It would be nice to have a target called something like "smoke-benchmarks" or something like that which allows me to specify -j$(nproc --all) with which I can smoke test that they all complete without errors (just like unittests). |
| Comments |
| Comment by Githook User [ 11/Jul/23 ] |
|
Author: {'name': 'Alex Neben', 'email': 'alexander.neben@mongodb.com', 'username': 'IamXander'}Message: |
| Comment by Alex Neben [ 11/Jul/23 ] |
|
As mentioned in slack I have put up a PR that removes the limit of -j=1. I have also included steps to run benchmarks and what the expected output should be. We believe that these benchmarks and not independent and running more than one at the same time causes them to fail. |
| Comment by Kaloian Manassiev [ 10/Jul/23 ] |
|
What is PB? I really don't get it why can't you just allow -j to work for benchmarks for values greater than 1. I already can do something dumb like run benchmarks on a debug build and it emits a warning, so it's the same logic for -j8. |
| Comment by Steve Gross [ 10/Jul/23 ] |
|
kaloian.manassiev@mongodb.com: if we speed up the benchmark wallclock execution time in PB (say, to less than an hour), will that meet your needs? IIUC, that will let you verify reasonably quickly that your change does(n't) break benchmarks. WDYT? (Note: It's not theoretical; we are actively exploring ways to speed up benchmark testing in PB). |
| Comment by Daniel Moody [ 05/Jul/23 ] |
|
I think there are two options we can do moving forward to help this situation. 1. Create suites which could be used like "--suites=concurrent_benchmarks,serial_benchmarks" to allow some of the suites that can safely run in parallel to do so, and contain others which must be run serially. |
| Comment by Jason Chan [ 05/Jul/23 ] |
|
Assigning back to SDP. Talking to daniel.moody@mongodb.com, we think a solution could be to utilize the --benchmark_min_time option to drastically shrink the run-time of of the benchmarks, and then have a concurrent test suite that runs them in parallel. There may be difficulties in the filesystem benchmarks which may require them to be run in a serial suite instead. |
| Comment by Kaloian Manassiev [ 03/Jul/23 ] |
|
daniel.moody@mongodb.com, this doesn't help me. What I need to test (and I have introduced a number of BFs recently) is that a change I make in SConscript for example or in some central decoration initialiser, doesn't cause any of the benchmarks to crash. This is as opposed to ensuring that the performance is the same. All I care about is that we remove the restriction of --j=1 for benchmarks. The rest of the tools and evergreen can continue running with 1. Does that make sense? |
| Comment by Daniel Moody [ 03/Jul/23 ] |
|
kaloian.manassiev@mongodb.com we have a high level target "prove-benchmarks" which will only run benchmarks which have incurred changes from that last time it was run, so essentially it will only rerun benchmarks which are possibly affected, even such as changing some include header. Rerunning with no changes, will quickly ("scons" quickly, so like 30-45 seconds) reprint the last test results (pass or fail). This does mean there is an upfront cost to run all the tests, but then after that, iterations should be faster and only run a small set of affected benchmarks (unless you working on some core header which affects everything). Is this helpful for your situation? |
| Comment by Billy Donahue [ 03/Jul/23 ] |
|
The Google Benchmarks API doesn't really offer a "check for correctness" mode of operation, at least not directly. The is a --benchmark_min_time, documented directly in the code. What the documentation doesn't mention about this flag is that it also informs the benchmark maximum time.
This MinTime defaults to the --benchmark_min_time given on the command line for the benchmark. The default value is 0.5 seconds, btw. So I think this is all about the command lines that resmoke uses to execute benchmarks, and wouldn't involve Service Architecture. |
| Comment by Daniel Moody [ 03/Jul/23 ] |
|
jason.chan@mongodb.com even if we are talking about correctness, does that imply someone (or service arch) will have increased maintenance overhead for the extra mode for these tests? Lets say we do add "concurrent correctness mode", if the benchmarks fail because they are running parallel, what does that imply or mean? How does one differentiate that failure? I am not opposed to the idea, but I am tying to understand exactly how much value is here. |
| Comment by Jason Chan [ 03/Jul/23 ] |
|
I can assign this to Service Arch to triage to answer questions around whether we think there's anything interesting in regards to parallelizing the workloads, but my understanding of the request here is that it's mostly about completing the workloads without failure, as opposed to measuring actual performance. We can discuss the ownership of this once I gather feedback on daniel.moody@mongodb.com's questions. I'd be interested to get SDP's opinions as well on whether there are alternatives we haven't explored here around getting a better signal so developers can better catch things like this locally. |
| Comment by Daniel Moody [ 03/Jul/23 ] |
|
The question is why is it intentionally serial in the first place? Are the benchmark tests sensitive to running in parallel? They seem to be measuring performance, if they are running concurrently will that impact the results? Yes the implementation of running some test concurrently is in resmoke, and other test list based tests (like unittests) already do this, but is that what we really want to do for these benchmark tests? |
| Comment by Steve Gross [ 03/Jul/23 ] |
|
daniel.moody@mongodb.com, can you weigh in here? IIRC, the discussion centered on the desirability of parallelization in the first place...? |
| Comment by Steve Gross [ 03/Jul/23 ] |
|
blake.oler@mongodb.com jason.chan@mongodb.com Per SDP discussion, it sounds like this would be owned by service arch. Can y'all advise? |