[SERVER-79622] [CQF] Improve intersectDNFIntervals behavior for variable bounds Created: 02/Aug/23 Updated: 06/Feb/24 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Boros | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||
| Sprint: | QO 2023-09-18, QO 2023-10-02, QO 2023-10-16, QO 2023-10-30, QO 2023-11-13, QO 2023-11-27, QO 2023-12-11, QO 2023-12-25 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
intersectDNFIntervals produces an overly complicated result for simple inputs: Input:
Output:
The optimal result from this function would be a single interval, (getParam(0), ""). intersectDNFIntervals likely does not have special cases for intervals where the low bound is MinKey or the high bound is MaxKey. The number of intervals unioned in the result approximately doubles for each additional variable interval. We get 6 intervals with 3 variable intervals, 14 intervals for 4 variable intervals, etc. This is likely because the function intersects two intervals at a time rather than analyzing the intervals all at once for a better result. This is relevant for parameterization of match expressions for the plan cache. |
| Comments |
| Comment by David Percy [ 02/Aug/23 ] | |||||||||||||||||
|
To apply this idea to the example in this ticket: we're starting with this conjunction:
With half-intervals it would be represented this way instead:
But [minKey and maxKey] are both trivially-true, so we'd drop them:
And optionally rearrange:
| |||||||||||||||||
| Comment by David Percy [ 02/Aug/23 ] | |||||||||||||||||
|
One thing I've been wondering is whether it would be simpler to deal with half-intervals. An interval is a conjunction of two half-intervals, and we already have a DNF. A half-interval has only one endpoint, so there are fewer cases to consider. We wouldn't need auxiliary intervals. For example, take this intersection of 4 intervals:
There are many combinations of inclusive/exclusive bounds to consider. Now instead think of it as a conjunction of 8 half-intervals:
This lets you rearrange them:
And then it's easy to combine half-intervals with the same inclusivity:
Any conjunction of intervals can be normalized this way:
We don't need auxiliary intervals to represent the fact that the inclusivity depends on the values of a,b,c,d. When lowering to an expression, we would still have some conditionals comparing a,b and c,d:
and these would be constant-folded once a,b,c,d are known. I'm not sure how to convert these to compound intervals (where each endpoint is a tuple), but probably a similar conditional would work. Ultimately you want something like:
|