[SERVER-73033] Improve sparse histogram bucket equality estimate Created: 19/Jan/23 Updated: 24/Jan/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alya Berciu | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Participants: | |||||||||
| Description |
|
This resulted from the investigation around If we want to estimate a range (l, u) that falls within a bucket with lower bound L and upper bound U, ndv distinct values, and rangeFreq values in the bucket, we may obtain a negative estimate. This is because we estimate this range as: card(<u) - card(<l) = card(<=u) - card(=u) - card(<l) = card(<=u) - card(<l) - rangeFreq/ndv If rangeFreq/ndv > card(<=u) - card(<l), we obtain a negative estimate. Since the two sides of this inequality are independent, we have no guarantees currently that we can't encounter this case. The current fix ( |