[SERVER-78190] Evaluate feasibility of parallelizing the "benchmarks" task Created: 16/Jun/23 Updated: 29/Oct/23 Resolved: 08/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Steve Gross | Assignee: | Steve Gross |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Participants: | |||||
| Linked BF Score: | 149 | ||||
| Description |
|
In this BF, we identified a task that regularly takes too long to run (close to the 6H timeout). Although we are temporarily mitigating the timeout risk (by changing the timeout from 5H to 6H), a better long-term fix is to parallelize the subtasks within the problematic task, thereby reducing total wallclock execution time. The task in question is "benchmarks", defined here. More details available in this Slack thread. |
| Comments |
| Comment by Steve Gross [ 08/Aug/23 ] |
|
I checked the task execution history (first_half, second_half). For the most part, the tasks are succeeding. Spot checks of individual failures showed they are due to reasons OTHER than timeouts. Thus, I think we can safely conclude the refactor successfully addressed the timeout concern. |
| Comment by Githook User [ 02/Aug/23 ] |
|
Author: {'name': 'Steve Gross', 'email': 'steve.gross@mongodb.com', 'username': 'stevegrossmongodb'}Message: |
| Comment by Githook User [ 01/Aug/23 ] |
|
Author: {'name': 'Steve Gross', 'email': 'steve.gross@mongodb.com', 'username': 'stevegrossmongodb'}Message: |
| Comment by Alex Neben [ 25/Jul/23 ] |
|
This solution will work well! A little gross to maintain but this should last another 5 years or so |
| Comment by Steve Gross [ 20/Jul/23 ] |
|
Note to self: Slack thread to discuss more |
| Comment by Tausif Rahman (Inactive) [ 19/Jul/23 ] |
|
One thing to add is that it looks like benchmarks_orphaned runs tests from the build/ directory which implies that it depends on compile. I see that the benchmarks_template "depends on" compile here. This can cause issues because version_gen is currently the first task that runs and "depends on" no one. I don't think we want that to change. Alternatively, we can have a new `version_benchmark_gen` task which "depends on" compile & generates only benchmark tasks but some design would be needed. Also, currently the task generator will not add the "do benchmark setup" function to generated tasks, but this is a requirement for `benchmark*` tasks. We probably need the task generator to conditionally add the "do benchmark setup" function (& maybe others) when the task generator encounters a benchmark task that needs to be generated. |
| Comment by Steve Gross [ 19/Jul/23 ] |
|
Notes from chat w/ jeff.zambory@mongodb.com:
|
| Comment by Steve Gross [ 12/Jul/23 ] |
|
I've created this draft PR as a POC to use the new max_duration field. Will discuss with various folks to see if it's the right way to move forward. |
| Comment by Steve Gross [ 10/Jul/23 ] |
|
Per jeff.zambory@mongodb.com 's note, I'll explore how to amend the task generator to leverage the new field. |
| Comment by Steve Gross [ 22/Jun/23 ] |
|
I spent some quality time w/ mikhail.shchatko@mongodb.com reviewing the split_task() code. Now I need to play around with executing the unit test, and circle back w/ daniel.moody@mongodb.com to think about what the amended logic should look like. |
| Comment by Steve Gross [ 21/Jun/23 ] |
|
Per Slack discussion: it looks like the best solution is to amend the task generator's "bucketing" logic (defined here and here). I'll propose some new logic and get consensus before trying to implement it. |
| Comment by Steve Gross [ 21/Jun/23 ] |
|
Soliciting design guidance on Slack |