Improve estimates for small number qualifying sample documents

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      The precision of sampling estimates depends on the number of qualifying documents in the sample. When this number is small (in the literature under 10) we have no guarantees for the precision of the estimate.

      This ticket aims to improve the situation in this case. There are several options to be investigated:

      • when the qualifying documents are less than 10, round them to 10 and compute the estimate from here. This will not improve the precision per se, but provides an upper bound to the estimate, which can be sufficient for the CBR to pick a good plan.
      • when the qualifying documents are less than 10, return the estimate with an error status. This can allow for example CBR to switch to histograms.
      • always assign a 'reliability' metric to the estimate, and use a low reliability for estimates derived from too few qualifying samples.

            Assignee:
            Unassigned
            Reporter:
            Milena Ivanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: