[Join Optimization] Join predicate estimation can oscillate between different runs of the same query

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When running TPC-H Q3 on scale factor 0.1, the optimizer must estimate the selectivity of the c_custkey = o_custkey predicate. This is done by estimating the NDV of c_custkey (we assume that customer is the primary table because it is smaller than orders). The current code will first check if the ndv of the column is the same as the size of the sample, in this case, we assume the column is unique and return the cardinality of the underlying collection. If not, then we perform Newton Raphson algorithm for estimating NDV from the sample.

      Because our current sampling mechanism does sampling from the collection with replacement, the sample may end up with duplicates of the same document. If this occurs, our NDV estimate will use NR instead of the collection cardinality. I've observed that the estimates from NR may vary widely and depend on the sample size, up to an order of magnitude underestimate of NDV, leading to order of magnitude overestimation of the cardinality of the customer-order join subset.

      This ticket tracks work to investigate this CE problem. Some ideas to consider are whether we can sample without replacement or use the collection cardinality as a parameter to the NDV estimation (currently it is ignored and we suspect that is makes it hard to distinguish cases that have a few duplicates but collection cardinalities differ).

      This ticket is related to SERVER-117085 which will solve this problem for cases where a unique index exists that we can use to improve estimation. However, this ticket is still useful to work on as a unique index may not always be present.

            Assignee:
            Unassigned
            Reporter:
            Ben Shteinfeld
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: