-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
ALL
This bug is discovered in WRITING-28668. Usually, the NDV is determined by the mid of the data interval [min, max]. For example, the NDV for integers is (min + max) / 2. This NDV selection is usually enough.
However, the data interval for strings presents the length of strings and they are small numbers. For example, strings with lengths between 16 to 32 bytes only have 24 distinct values. That caps the number of the buckets in histogram generation and prevents us observing the effect of buckets on performance.