-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
What we observed from local testing this summer is that when we run queries that result in higher number of keys examined (i.e. high hundreds/thousands), the linear regression shows that the startup cost is 0 because the larger execution time of the entire query makes the startup cost negligible. This is also in line with what we found during the Bonsai cost calibration. Therefore, we use smaller queries for calibration, and use the linear regression to extrapolate the cost of incrementally examining the thousandth key, for example.
However, its more important for the incremental cost to be accurate in the case where we are scanning thousands+ keys than for the start up cost to be accurate when we are scanning 100 keys. One idea to take care of this is that we can use a set of queries that return a smaller result set in order to calibrate the start up cost (i.e. when we get the linear regression, we discard the incremental cost) and then use a set of queries that return a larger result set in order to calibrate the incremental cost (we discard the start up cost from this linear regression). This will require some refactoring of the scripts.
This issue came up for the index scan node calibration, but is relevant for at least sort nodes as well. If we decide to do this ticket, we should revisit all the nodes we currently have workloads for to determine if we should do this.