[SERVER-33162] Make the update fuzzer run faster Created: 07/Feb/18  Updated: 06/Dec/22  Resolved: 03/Dec/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Won't Fix Votes: 0
Labels: stm
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Server Tooling & Methods
Participants:

 Description   

The update fuzzer runs at up to 15x slower on the 3.6 branch compare with master. This is especially pronounced in the OS X and Code Coverage variants. (Enterprise OS X MMAP1, SSL OS X, RHEL 6.2 DEBUG Code Coverage), causing timeouts in the update_fuzzer_replication suite on these variants. The other variants all suffer between a 2x to 5x slowdown, but not enough to trigger a timeout.

The slowdown is likely caused by the additional work of resetting the database and running more JavaScript in the shell to handle blacklisting. 3.7 has not had a similar problem so far, but it should experience a similar slowdown as more known differences are discovered between 3.7 and 3.6.

It might be a good idea to proactively reduce the number of fuzzer generated files and the size of each file to prevent sporadic failures in the future, especially for the update_fuzzer_replication suite.



 Comments   
Comment by Brooke Miller [ 03/Dec/21 ]

STM doesn't have bandwidth to pick this up, so we're closing as won't fix. However, if other teams have capacity and would like to pursue this, please feel free to pick this up.

Comment by Max Hirschhorn [ 08/Feb/18 ]

Max Hirschhorn I didn't think of setting continue_on_failure=true, is the goal to ensure we have some coverage data even if the task times out?

Sorry, I managed to type continue_on_failure=true when looking to see what defaults are set via the &jstestfuzz_config_vars anchor. What I really meant to say is "Should we start out with setting run_multiple_jobs=true for the update fuzzer because each of the tests start their own MongoDB deployment?"

Even when the Evergreen task times out, the code coverage generated thus far is still uploaded to S3. I don't think there's a lot of value in running with continue_on_failure=true because any failure in one of the generated files is signal that something has gone wrong and knowing that five more generated files would have failed as well doesn't seem particularly helpful as more signal. I think Evergreen's stepback should be "good enough" to help identify the problematic commit.

Lowering the number of generated files for OS X and ARM sounds good. But we'd probably have to introduce a new "setMultiplier" on the fuzzer's side since the number of generated files is different for different suites.

I was thinking of just having an expansion that defines a fixed number of generated files to create and used it in each of the task definitions. We could also define a ratio but I'm not sure it is worth introducing the complexity equivalent to how the number of resmoke.py jobs are determined by the "run tests" function.

Comment by Robert Guo (Inactive) [ 08/Feb/18 ]

max.hirschhorn I didn't think of setting continue_on_failure=true, is the goal to ensure we have some coverage data even if the task times out?

Lowering the number of generated files for OS X and ARM sounds good. But we'd probably have to introduce a new "setMultiplier" on the fuzzer's side since the number of generated files is different for different suites.

Comment by Max Hirschhorn [ 07/Feb/18 ]

robert.guo, should we start out with setting continue_on_failure=true for the update fuzzer because each of the tests start their own MongoDB deployment? It won't address any issues with the OS X machines, but I've been debating whether we should lower the number of generated files we run on the OS X and ARM machines altogether (i.e. even for jstestfuzz* Evergreen tasks).

Generated at Thu Feb 08 04:32:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.