-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
Fully Compatible
-
QO 2024-09-30, QO 2024-10-14
This ticket aims to support type-bracketed interval in histogram CE estimation. For example, given a find query {a: {$gt: 123}} , the generated type-bracketed interval is [123, "").
We should relax the constraint in the interface implementation which requires both bounds to share the same type as long as one of the bounds is meant to bracket the type of the other bound.
We could reuse some of the code from previous prototype on mixed-type interval implementation.
Difference between the changes and the Bonsai code
Reusing the code from Bonsai
The code originates from analyzeIntervalEstimationMode which checks if an interval is estimable via histograms or not. However, to decouple from Bonsai such as ABT, this ticket needs to make some modification:
- Changes getMinMaxBoundForType to getMinMaxBoundForSBEType to avoid ABT.
- isIntervalSubsetOfType is not reused due to its dependency on IntervalRequirement and intersectDNFIntervals. Instead, this ticket introduces sameTypeBracketedInterval which only allows type-bracketed intervals.
- Removes heuristics mode kFallbackMode for CBR as discussed. Due to reduced choices, this ticket removes the extra call to `analyzeIntervalEstimationMode` by inlining the if-else statements at the call site estimateInterval.
About sameTypeBracketedInterval
sameTypeBracketedInterval() changes the way how we accept type-bracketed interval. Originally, the type-bracketed interval is accepted by the rule checking either bound is histogrammable. So a type-bracketed interval like ["abc", {}) will be accepted because one of the bounds is histogrammable.
However, the following non-type-bracketed edge cases are also accepted by the same rule which may yield underestimated results.
- ["abc", \{a: 1}] : This may be underestimated because it will be analyzed as `kUseHistogram` and later deferred to estimateRange. Given that there are no buckets for objects in histograms, the estimateRange will miss the cardinality between {} and {a: 1}.
- ["abc", ObjectId("....")] : This may be underestimated for the same reason above (i.e. analyzed with kHistogram and estimated via estimateRange). It misses the estimates in between such as arrays, objects, etc.
The above intervals can originate from either $nin like {$nin: ["abc", ObjectId("...")]} or $expr on non-multikey paths.
Therefore, to avoid underestimation, this ticket only allows type-bracketed intervals to be estimated and hence replace those either-one-histogrammable rule with sameTypeBracketedInterval().