[SERVER-78190] Evaluate feasibility of parallelizing the "benchmarks" task Created: 16/Jun/23  Updated: 29/Oct/23  Resolved: 08/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Steve Gross Assignee: Steve Gross
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Participants:
Linked BF Score: 149

 Description   

In this BF, we identified a task that regularly takes too long to run (close to the 6H timeout). Although we are temporarily mitigating the timeout risk (by changing the timeout from 5H to 6H), a better long-term fix is to parallelize the subtasks within the problematic task, thereby reducing total wallclock execution time.

The task in question is "benchmarks", defined here. More details available in this Slack thread.



 Comments   
Comment by Steve Gross [ 08/Aug/23 ]

I checked the task execution history (first_half, second_half). For the most part, the tasks are succeeding. Spot checks of individual failures showed they are due to reasons OTHER than timeouts. Thus, I think we can safely conclude the refactor successfully addressed the timeout concern.

Comment by Githook User [ 02/Aug/23 ]

Author:

{'name': 'Steve Gross', 'email': 'steve.gross@mongodb.com', 'username': 'stevegrossmongodb'}

Message: SERVER-78190 Split up benchmarks_orphaned into 2 better-balanced tasks
Branch: minh.luu-no_compile_sys-perf
https://github.com/mongodb/mongo/commit/0780f7a8783a1adf6fdb6731de01392c31d29596

Comment by Githook User [ 01/Aug/23 ]

Author:

{'name': 'Steve Gross', 'email': 'steve.gross@mongodb.com', 'username': 'stevegrossmongodb'}

Message: SERVER-78190 Split up benchmarks_orphaned into 2 better-balanced tasks
Branch: master
https://github.com/mongodb/mongo/commit/0780f7a8783a1adf6fdb6731de01392c31d29596

Comment by Alex Neben [ 25/Jul/23 ]

This solution will work well! A little gross to maintain but this should last another 5 years or so

Comment by Steve Gross [ 20/Jul/23 ]

Note to self: Slack thread to discuss more

Comment by Tausif Rahman (Inactive) [ 19/Jul/23 ]

One thing to add is that it looks like benchmarks_orphaned runs tests from the build/ directory which implies that it depends on compile. I see that the benchmarks_template "depends on" compile here.

This can cause issues because version_gen is currently the first task that runs and "depends on" no one. I don't think we want that to change. Alternatively, we can have a new `version_benchmark_gen` task which "depends on" compile & generates only benchmark tasks but some design would be needed.

Also, currently the task generator will not add the "do benchmark setup" function to generated tasks, but this is a requirement for `benchmark*` tasks. We probably need the task generator to conditionally add the "do benchmark setup" function (& maybe others) when the task generator encounters a benchmark task that needs to be generated.

Comment by Steve Gross [ 19/Jul/23 ]

Notes from chat w/ jeff.zambory@mongodb.com:

  • To change a task configuration to use task-gen (and split_task) logic, rename the "function" to "generate resmoke tasks"
  • The task immediately in question is benchmarks_orphaned
  • This PB shows that the benchmarks_orphaned task has 30+ tests in it
  • There are other benchmark_* tasks, but for now we'll focus on benchmarks_orphaned since that is the original motivating case for this work.
Comment by Steve Gross [ 12/Jul/23 ]

I've created this draft PR as a POC to use the new max_duration field. Will discuss with various folks to see if it's the right way to move forward.

Comment by Steve Gross [ 10/Jul/23 ]

Per jeff.zambory@mongodb.com 's note, I'll explore how to amend the task generator to leverage the new field.

Comment by Steve Gross [ 22/Jun/23 ]

I spent some quality time w/ mikhail.shchatko@mongodb.com reviewing the split_task() code. Now I need to play around with executing the unit test, and circle back w/ daniel.moody@mongodb.com to think about what the amended logic should look like.

Comment by Steve Gross [ 21/Jun/23 ]

Per Slack discussion: it looks like the best solution is to amend the task generator's "bucketing" logic (defined here and here). I'll propose some new logic and get consensus before trying to implement it.

Comment by Steve Gross [ 21/Jun/23 ]

Soliciting design guidance on Slack

Generated at Thu Feb 08 06:37:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.