[Join Optimization] Avoid oscillating in and out of "avoid Newton-Raphson optimization" for defacto unique fields

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • Fully Compatible
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When estimating join predicate selectivities, we use 1/NDV(join field of smaller coll). When estimating the NDV of the join field, we have an optimization that avoids using the Newton-Raphson algorithm for the estimate if the field is unique in the entire sample, in which case we return card(coll) as the NDV estimate. Essentially in optimization assumes that if every value in the sample is unique, the field itself is unique.

      However, our sampling infrastructure performs sampling with replacement, which means that it is possible for a unique field to have a duplicate in the sample (consider TPCH SF1 orders table with 1.5M rows and a sample size of 1k, there is ~28% of the sample having at least one duplicate row). When the sample contains a duplicate, the code uses NR, which comes up with an estimate for NDV that is several orders of magnitude different from card(coll). That causes a lot of plan instability because it dramatically changes the CE of subsets of the join graph. This leads to plan in stability between runs of the same query.

      Our current Method of moments estimator for NDV assumes that we perform sampling with replacement, so we cannot suddenly start perform sampling without replacement and maintain the same statistical guarantees. Instead, this ticket suggests that we de-duplicate the sample for the purposes of determining whether the current field is eligible for the "avoid NR" optimization and return card(coll) as the NDV estimate. This will allow defacto unique fields to get a consistent estimate.

            Assignee:
            Ben Shteinfeld
            Reporter:
            Ben Shteinfeld
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: