[SERVER-73033] Improve sparse histogram bucket equality estimate Created: 19/Jan/23  Updated: 24/Jan/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alya Berciu Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-72899 Invalid cardinality estimate Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

This resulted from the investigation around SERVER-72899.

If we want to estimate a range (l, u) that falls within a bucket with lower bound L and upper bound U, ndv distinct values, and rangeFreq values in the bucket, we may obtain a negative estimate. This is because we estimate this range as:

card(<u) - card(<l) = card(<=u) - card(=u) - card(<l) = card(<=u) - card(<l) - rangeFreq/ndv

If rangeFreq/ndv > card(<=u) - card(<l), we obtain a negative estimate. Since the two sides of this inequality are independent, we have no guarantees currently that we can't encounter this case.

The current fix (SERVER-72899) is to clamp this to 0.0, but we can do better. See comments for more.


Generated at Thu Feb 08 06:23:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.