-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
The precision of sampling estimates depends on the number of qualifying documents in the sample. When this number is small (in the literature under 10) we have no guarantees for the precision of the estimate.
This ticket aims to improve the situation in this case. There are several options to be investigated:
- when the qualifying documents are less than 10, round them to 10 and compute the estimate from here. This will not improve the precision per se, but provides an upper bound to the estimate, which can be sufficient for the CBR to pick a good plan.
- when the qualifying documents are less than 10, return the estimate with an error status. This can allow for example CBR to switch to histograms.
- always assign a 'reliability' metric to the estimate, and use a low reliability for estimates derived from too few qualifying samples.