|
We should have integration and/or unit tests that exercise the following scenarios in histogram generation and in estimation of predicates:
- minimum and maximum values for each type (most importantly numeric)
- inf/NaN/invalid values- if we can insert these into a collection, we have to make sure we handle them correctly during bucket creation/estimation
- a wide range of values including extreme types
- extreme date/time values
- Decimal128 types that are too large to fit in a double
- very large arrays
- very large strings
We need to ensure both that histogram creation on these types results in a valid histogram, and that cardinality estimation for these values (both when present and when absent from a histogram) works adequately.
|