Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Performance Benchmarking
Labels:
None

Quarter:
- FY27Q1
Confidence Status:
None

Assigned Teams:

Go Drivers

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Context

Right now, we compare the results of the benchmarks run in the "perf" task with the "stable region" from a set of previous waterfall builds, and post the diff of those times on PRs. That has proved to be quite noisy and isn't providing any value right now. We need to find a way to reduce the noise so that only real performance diffs are called out.

At least one source of noise is that individual benchmark runs often differ a lot from the baseline, but that difference doesn't stay consistent through multiple runs. We really need to collect data from multiple runs (5 or more) in the "perf" task to make sure the results are valid (i.e. form a new stable region). I recommend starting by running the benchmarks in the "perf" task multiple times (5 or more) and comparing all of the new results to see if it forms a new stable region.

Definition of done

Run the benchmarks in the "perf" task 5 or more times, reporting all results.
- It's possible the simplest way to implement that is to run the "perf" task 5 times, but it's not clear how to do that.
Update the perfcomp tool to be able to fetch stats from multiple benchmark runs per task, comparing the waterfall and the patch build stat series to see if there's a new stable region.

Pitfalls

What should the implementer watch out for? What are the risks?

Assignee:: Unassigned
Reporter:: Matt Dale
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Dec 19 2025 04:16:45 AM UTC
Updated:: Jan 27 2026 02:54:30 PM UTC

Details

Description

Context

Definition of done

Pitfalls

Attachments

Activity

People

Dates