Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-100611

Better estimation of Index skip scan plans in CBR

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Consider the query 

      {a: 5, b: {$gt: 6}}} 

      and indexes

      {a: 1, b: 1}
      {b: 1, a: 1} 

      The two plans we generate with use either one of the indexes with point bounds on 'a' and a range on 'b'. The cost of these index scans is not the same! The (a,b) scan is significantly cheaper because there is a single index seek. The (b,a) scan is an index skip scan plan as it required seeking in the index # of unique values of b times.

      SERVER-97529 implemented a heuristic to handle this case only considering the bounds of the equality prefix of the scan, that is the (b, a) scan will be estimated as is the bounds were b: (6, inf] a: [MinKey, MaxKey].

      This fixed the plan selection for the case presented above, but it is fairly easy to construct a scenario where it leads to wildly inaccurate estimates. This ticket is to explore ideas to improve this estimation.

      One idea is estimating the NDV of each equality prefix to estimate the number of seeks the index scan will need to perform and calibrating a corresponding cost coefficient. Histograms are good at estimating NDV, but sampling may have problems here.

            Assignee:
            Unassigned Unassigned
            Reporter:
            ben.shteinfeld@mongodb.com Ben Shteinfeld
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None