Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-73033

Improve sparse histogram bucket equality estimate

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization

      This resulted from the investigation around SERVER-72899.

      If we want to estimate a range (l, u) that falls within a bucket with lower bound L and upper bound U, ndv distinct values, and rangeFreq values in the bucket, we may obtain a negative estimate. This is because we estimate this range as:

      card(<u) - card(<l) = card(<=u) - card(=u) - card(<l) = card(<=u) - card(<l) - rangeFreq/ndv

      If rangeFreq/ndv > card(<=u) - card(<l), we obtain a negative estimate. Since the two sides of this inequality are independent, we have no guarantees currently that we can't encounter this case.

      The current fix (SERVER-72899) is to clamp this to 0.0, but we can do better. See comments for more.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            alya.berciu@mongodb.com Alya Berciu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: