Handle the case of identical predicates in the same equivalence class

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In phase two, we introduced inferring and propagating predicates across equivalence classes.

      This introduced a regression in join_cardinality_estimation_md.js because our cardinality estimation logic assumes that all predicates are independent but the predicates were now being counted not once, but x times where x is how many nodes there are in the equivalence class that the predicate belongs to.

       

      Here's an example of the regression in more detail:
      db.many_rows.aggregate([{"$lookup":{"from":"many_rows","as":"right","let":

      {"localField":"$i_idx"}

      ,"pipeline":[{"$match":{"$and":[{"$expr":{"$eq":["$$localField","$i_idx"]}},{"i_idx":1}]}}]}},{"$unwind":"$right"},{"$match":{}}])
      Before SERVER-117816, the first node in the join graph has a cardinality of 1000 and the second node in the join graph (which has a STP) has a cardinality of 1. The edge selectivity is 0.001.

      With the changes from SERVER-117816, we've propagated the second node's STP to the first node. So the first node has a cardinality of 1 and the second node does as well. The edge selectivity is still 0.001.

      In JoinCardinalityEstimator::getOrEstimateSubsetCardinality(), we multiply: # The selectivities from the reduced edge list.

      1. The base table cardinalities.
      2. The single-table predicate selectivities.

      so previous the estimate becomes 1000 * 1 * 0.001 = 1. But after SERVER-117816, we have 1 * 1 * 0.001 = 0.001.

      To fix this, we need to maintain which predicates are original to the user query and which were derived. Then to estimate the cardinality of a node in the join graph, we should use the filter that only includes predicates original to the node. But the node's access path in the query solution must include all predicates, derived and inferred.

            Assignee:
            Unassigned
            Reporter:
            Maddie Zechar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: