[SERVER-80277] [CQF] Make plan selection as deterministic (repeatable) as possible Created: 21/Aug/23  Updated: 21/Sep/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Shteinfeld Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-80202 [CQF] Non-deterministic projection na... Closed
is related to SERVER-80808 [CQF] Projection rename phase Closed
is related to SERVER-81052 [CQF] More robust picking of best plan Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

Filing this ticket to capture some ideas discussed at the Bonsai engineering sync. This ticket contains many ideas and I fully expect this ticket to split off into several tickets (or its own PM) in the future.

The consensus of the meeting was that it seems fairly important to do our best to ensure that the same query run multiple times results in the same plan being chosen. This can help customers and support teams find issues faster; the classic engine has the same problem.

The main way this can manifest in Bonsai is when two plans have identical costs according to the cost model and we pick the one explored first. If two rewrites are added to the queue with equal priority (and result in identical costs), they may be explored in a different order, resulting in a different plan being selected on subsequent runs of the same query. There are also other instances in the optimizer that use hash-based containers that do no guarantee stable iteration order that can result in similar outcomes.

Some potential solutions discussed:

  • Deterministic cost tie-breaking by using a lexicographical ordering of the plan itself.
  • Assign/enforce unique priorities to all rewrites so that they are always performed in the same order.

The other way this problem can manifest is through changes in CE, especially sampling-based CE. Using random sampling of the collection can result in different CE estimates on the same predicates.

Some potential solutions discussed:

  • Use sequential sampling (read from the start of the collection)
  • Start the random cursor used for sampling with the same seed on each run

Lastly, this problem affects our testing strategy for bonsai. The most common unit test will construct some input logical ABT, invoke the optimizer, and assert on the outputted physical ABT. The order of rewrites which are invoked changes the projection names which are generated, meaning that these tests can non-deterministically fail due to equivalent plans but different projection names.

Some potential solutions discussed:

  • Change tests to only assert on the structure that they want to verify
  • Perform some plan canonicalization which will deterministically rename projections before asserting on plan explain equality
  • [Overlapping with ideas above] Ensure that rewrites are applied in consistent order

Generated at Thu Feb 08 06:43:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.