-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Workload Resilience
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In a steady-state workload, constantly adding and removing tickets will result in some spikiness in the throughput, especially when we're adding and removing 20% of our tickets at each step, the default gain.
It would be nice to maintain some history about whether our attempts to probe up and down resulted in improvements. If adding and removing tickets didn't result in any improvements in recent history, we should use a smaller gain when probing. If there is a workload shift and our probing attempts succeed, we will increase our gain each step until we reach the point of diminishing returns. At this point, the cycle will repeat itself. This would result in a more gradual ramp-up and faster ramp-down.
The simplest way to do this would be to look at the previous probe result and reduce the gain to a smaller value. A slightly more sophisticated strategy would be to keep an exponentially-decaying moving average of success rate and use that as an input to the chosen gain.