The benchmarks in benchmarks_sep currently run with 1, 2, 4, 8, 16, and 32 threads. We currently only track the single-threaded case. The single-threaded case is also the lowest noise.
There may be some regressions that are only caught with multiple threads. Things like spinlock contention and lock-free loops should be caught by retired instruction count. Other things like mutex contention won't be.
I suggest we update the benchmarks to run at 1 and 16 threads. This covers the single and multi-threaded case. 16 threads seems to give the best balance of noise to signal (32 threads is quite noisy). Also, the VMs we currently run on have 16 vCPUs.
The main motivation for decreasing the number of threads we run at is time saving since this task runs on every commit and in PR checks.
Acceptance Criteria:
- Update benchmarks_sep to only run at 1 and 16 threads.