Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Optimization
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

What we observed from local testing this summer is that when we run queries that result in higher number of keys examined (i.e. high hundreds/thousands), the linear regression shows that the startup cost is 0 because the larger execution time of the entire query makes the startup cost negligible. This is also in line with what we found during the Bonsai cost calibration. Therefore, we use smaller queries for calibration, and use the linear regression to extrapolate the cost of incrementally examining the thousandth key, for example.

However, its more important for the incremental cost to be accurate in the case where we are scanning thousands+ keys than for the start up cost to be accurate when we are scanning 100 keys. One idea to take care of this is that we can use a set of queries that return a smaller result set in order to calibrate the start up cost (i.e. when we get the linear regression, we discard the incremental cost) and then use a set of queries that return a larger result set in order to calibrate the incremental cost (we discard the start up cost from this linear regression). This will require some refactoring of the scripts.

This issue came up for the index scan node calibration, but is relevant for at least sort nodes as well. If we decide to do this ticket, we should revisit all the nodes we currently have workloads for to determine if we should do this.

Assignee:: Unassigned
Reporter:: Militsa Sotirova
Participants:: Militsa Sotirova
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Sep 05 2025 04:41:53 PM UTC
Updated:: Sep 05 2025 04:48:25 PM UTC

Details

Description

Attachments

Activity

People

Dates