-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
Filing this ticket to capture some ideas discussed at the Bonsai engineering sync. This ticket contains many ideas and I fully expect this ticket to split off into several tickets (or its own PM) in the future.
The consensus of the meeting was that it seems fairly important to do our best to ensure that the same query run multiple times results in the same plan being chosen. This can help customers and support teams find issues faster; the classic engine has the same problem.
The main way this can manifest in Bonsai is when two plans have identical costs according to the cost model and we pick the one explored first. If two rewrites are added to the queue with equal priority (and result in identical costs), they may be explored in a different order, resulting in a different plan being selected on subsequent runs of the same query. There are also other instances in the optimizer that use hash-based containers that do no guarantee stable iteration order that can result in similar outcomes.
Some potential solutions discussed:
- Deterministic cost tie-breaking by using a lexicographical ordering of the plan itself.
- Assign/enforce unique priorities to all rewrites so that they are always performed in the same order.
The other way this problem can manifest is through changes in CE, especially sampling-based CE. Using random sampling of the collection can result in different CE estimates on the same predicates.
Some potential solutions discussed:
- Use sequential sampling (read from the start of the collection)
- Start the random cursor used for sampling with the same seed on each run
Lastly, this problem affects our testing strategy for bonsai. The most common unit test will construct some input logical ABT, invoke the optimizer, and assert on the outputted physical ABT. The order of rewrites which are invoked changes the projection names which are generated, meaning that these tests can non-deterministically fail due to equivalent plans but different projection names.
Some potential solutions discussed:
- Change tests to only assert on the structure that they want to verify
- Perform some plan canonicalization which will deterministically rename projections before asserting on plan explain equality
- [Overlapping with ideas above] Ensure that rewrites are applied in consistent order
- is related to
-
SERVER-80202 [CQF] Non-deterministic projection names generated in physical optimizer test breaking golden test
- Closed
-
SERVER-80808 [CQF] Projection rename phase
- Closed
-
SERVER-81052 [CQF] More robust picking of best plan
- Closed