-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
Fully Compatible
Implement histogram estimation for type counts. We apply this estimation when both bounds of an interval are not histogrammable.
1. Intervals with non-histogrammable but estimable types
Those types includes these scalar values: true, false, null, NaN, and empty array []. They have their own counters allowing to know their exact number of occurrence.
For example, consider the data in a collection:
> db.coll.find() {a: true} {a: false} {a: null} {a: []} {a: NaN}
Intervals could match documents there
(false, true] // find({a: {$gt: false}}) [true, true] // find({a: {$gte: true}}) [null, null] // find({a: {$lte: null}}) [nan.0, nan.0] // find({a: {$gte: NaN}}) [[], []] // find({a: {$eq: []}})
Edge cases we want to address them correctly
[false, false) // empty interval (false, false] // empty interval. This may be normalized to [false, false) first? [nan.0, nan.0) // likewise, empty interval [[], [123]] // rejected due to non-point interval
2. Intervals from $type query
See all the types in $type documentation.
["", {}) // find({a: {$type: 'string'}}) [nan.0, inf.0] // find({a: {$type: 'number'}}) "[{}, [])" // find({a: {$type: 'object'}}) [false, true] // find({a: {$type: 'bool'}})
NOTE: This ticket only addresses the non-mixed type intervals. We have follow-up tickets to address the following cases:
- Type-bracketed mixed-type interval (e.g. [123, "")) :
SERVER-94856 - Non-type-backeted mixed-type interval (e.g. [null, true]):
SERVER-94855
This ticket may be blocked by the interface implementation SERVER-88705.
Reference
We could reuse some code from this prototype.
1. We need a separate new PR.
2. We need to avoid using Selectivity. We just pass Cardinality around without converting them to Selectivity as in the prototype.