[SERVER-79544] Make the task activator handle errors more intelligently Created: 14/Jul/23 Updated: 29/Oct/23 Resolved: 02/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Task | Priority: | Minor - P4 |
| Reporter: | Memento Slack Bot | Assignee: | Jeffrey Zambory |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | DAG 2023-08-07 |
| Participants: | |
| Story Points: | 1 |
| Description |
|
It seems like the configure task endpoint is pretty consistently timing out for this particular patch/task when it tries to activate the tasks generated for it. Here's the exact endpoint being hit This patch is large, with a pretty large number of generated tasks within it. All of the tasks under the [JSTEST AFFECTED] variants are attempting to be activated. Is there anything that can be done in order to allow for this call to succeed? Should this endpoint be batched instead on the client side so that we try to activate only a certain number of tasks within each network call? Are there any guidelines or best practices around how many tasks we should be activating at a single time? ------------------------------------------------------------------------------------------------------------------------------------------------------------ AC:
|
| Comments |
| Comment by Githook User [ 02/Aug/23 ] |
|
Author: {'name': 'Jeff Zambory', 'email': 'jeff.zambory@mongodb.com', 'username': ''}Message: |
| Comment by Githook User [ 01/Aug/23 ] |
|
Author: {'name': 'Jeff Zambory', 'email': 'jeff.zambory@mongodb.com', 'username': ''}Message: |
| Comment by Jeffrey Zambory [ 21/Jul/23 ] |
|
Gotcha, thanks kimberly.tao@mongodb.com . Moving this over to DAG: to make us batch how many tasks get activated at a time. |
| Comment by Kim Tao [ 19/Jul/23 ] |
|
1. The main limit is the 60 second server timeout on all requests. I don't think we have a hard correlation between # tasks to activate and time to run (due to configuration-specific concerns like # dependencies that the tasks have), but activating ~1000 at a time should be feasible within 60 seconds. You can experiment with the particular batch sizes to see what's reasonable, or maybe try activating individual build variants rather than individual tasks. |
| Comment by Jeffrey Zambory [ 18/Jul/23 ] |
|
That would likely be doable but I would like to learn more about how this might affect things. If we begin batching calls, we might wind up with some patches that are in a half activated state if some later calls fail. Which isn't the worst thing in the world but is definitely a weird state to be in. Some questions:
|
| Comment by Kim Tao [ 17/Jul/23 ] |
|
Is it a sufficient workaround to submit configure requests in batches rather than in one big request (as described in the thread)? This likely is not that easily fixable since it's a performance issue with patch configuration. |