[SERVER-79622] [CQF] Improve intersectDNFIntervals behavior for variable bounds Created: 02/Aug/23  Updated: 06/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Matt Boros Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-83147 [CQF] Support constant folding of NaN... Closed
depends on SERVER-83427 [CQF] Improve constant folding for An... Closed
is depended on by SERVER-81958 M4. Meet Performance Targets Closed
Related
is related to SERVER-79377 POC Parameterize MatchExpression and ... Closed
Assigned Teams:
Query Optimization
Sprint: QO 2023-09-18, QO 2023-10-02, QO 2023-10-16, QO 2023-10-30, QO 2023-11-13, QO 2023-11-27, QO 2023-12-11, QO 2023-12-25
Participants:

 Description   

intersectDNFIntervals produces an overly complicated result for simple inputs:

Input:

{{
    {<Const [""]}
 ^ 
    {>FunctionCall [getParam] Const [0]}
}}

Output:

{
    {{[If [] BinaryOp [And] BinaryOp [And] BinaryOp [Or] BinaryOp [And] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [""] Const [true] BinaryOp [Or] BinaryOp [And] BinaryOp [Lt] Const [minKey] FunctionCall [getParam] Const [0] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [""] Const [false] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [maxKey] Const [false] Const [minKey] Const [maxKey], Const [minKey]]}}
 U 
    {{[Const [maxKey], If [] BinaryOp [And] BinaryOp [And] BinaryOp [Or] BinaryOp [And] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [""] Const [true] BinaryOp [Or] BinaryOp [And] BinaryOp [Lt] Const [minKey] FunctionCall [getParam] Const [0] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [""] Const [false] BinaryOp [Lt] FunctionCall [getParam] Const [0] Const [maxKey] Const [false] Const [maxKey] Const [minKey]]}}
 U 
    {{(If [] BinaryOp [Gte] Const [minKey] FunctionCall [getParam] Const [0] Const [minKey] FunctionCall [getParam] Const [0], Const [""])}}
}

The optimal result from this function would be a single interval, (getParam(0), ""). intersectDNFIntervals likely does not have special cases for intervals where the low bound is MinKey or the high bound is MaxKey.

The number of intervals unioned in the result approximately doubles for each additional variable interval. We get 6 intervals with 3 variable intervals, 14 intervals for 4 variable intervals, etc. This is likely because the function intersects two intervals at a time rather than analyzing the intervals all at once for a better result.

This is relevant for parameterization of match expressions for the plan cache.



 Comments   
Comment by David Percy [ 02/Aug/23 ]

To apply this idea to the example in this ticket: we're starting with this conjunction:

[minKey, "") ^ (param0, maxKey]

With half-intervals it would be represented this way instead:

[minKey  ^  "")  ^  (param0  ^  maxKey] 

But [minKey and maxKey] are both trivially-true, so we'd drop them:

"") ^ (param0

And optionally rearrange:

(param0  ^  "")

Comment by David Percy [ 02/Aug/23 ]

One thing I've been wondering is whether it would be simpler to deal with half-intervals. An interval is a conjunction of two half-intervals, and we already have a DNF. A half-interval has only one endpoint, so there are fewer cases to consider. We wouldn't need auxiliary intervals.

For example, take this intersection of 4 intervals:

[lo1, hi1] ^ (lo2, hi2) ^ (lo3, hi3] ^ [lo4, hi4)

There are many combinations of inclusive/exclusive bounds to consider.

Now instead think of it as a conjunction of 8 half-intervals:

[lo1  ^  hi1]  ^  (lo2  ^  hi2)  ^  (lo3  ^  hi3]  ^  [lo4  ^  hi4)

This lets you rearrange them:

// all the lower bounds together
[lo1  ^  [lo4  ^  (lo2  ^  (lo3 
^
// all the upper bounds together
hi1]  ^  hi3]  ^  hi2)  ^  hi4)

And then it's easy to combine half-intervals with the same inclusivity:

[max(lo1,lo4) ^ (max(lo2,lo3)
^
min(hi1,hi3)]  ^  min(hi2,hi4))

Any conjunction of intervals can be normalized this way:

[a ^ (b
^
c]  ^  d)

We don't need auxiliary intervals to represent the fact that the inclusivity depends on the values of a,b,c,d.

When lowering to an expression, we would still have some conditionals comparing a,b and c,d:

(a > b) ? a <= x : b < x
&&
(d < c) ? x < d : x <= c

and these would be constant-folded once a,b,c,d are known.

I'm not sure how to convert these to compound intervals (where each endpoint is a tuple), but probably a similar conditional would work. Ultimately you want something like:

ixscan lowbound=(condition1 ? ks1 : ks2) highbound=(condition2 ? ks3 : ks4)

Generated at Thu Feb 08 06:41:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.