We currently only track and make available for use in the timeout calculator the average time that a test took to pass recently. This metric is fine to use for most task timeout calculations as tests for the most part are supposed to have a set runtime they adhere to and deviations from that are incorrect.
However, tests such as benchmarks can have a semi-random runtime instead and can vary pretty widely from run to time. This makes it difficult for the task generator to split up benchmark tasks effectively and can cause semi-random timeouts and also task groups with very uneven wall clock runtimes in them.
To help minimize the risk of these timeouts, we instead want to begin bucketing based upon the recent max runtime of benchmark tests. This should be a better metric to use in regards to these benchmark tests and lead to less of a risk of new BFs that arise and then closed as noise.
To support this, we're exposing a new parameter called `max_duration_pass` as part of the task runtime history and need to update the Python buildscript code in the mongo repo to expect this new parameter. The task generation work will be done in a separate ticket.