Handle cardinalityEstimate of zero in the join optimizer

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      samplingCE will return an estimate of zero when a given predicate matches no rows in the sample. This however is not a guarantee that the collection does not contain any matching rows at all.

      A cardinalityEstimate of zero causes the cost to also go down to zero, meaning that this plan will always be picked, even if it is not truly the best plan.

      CBR is also suffering from the same problem, but in join optimization there is a potential for the error to compound.

      To reproduce:

      pipeline = EJSON.deserialize([{
          "$match": {
              "$and": [{
                  "o_clerk": {
                      "$eq": "Clerk#000000849"
                  }
              }, {
                  "o_orderdate": {
                      "$gt": {$date: "1992-06-03T00:00:00.000Z"},
                      "$lt": {$date: "1993-12-19T00:00:00.000Z"}
                  }
              }]
          }
      }, {
          "$lookup": {
              "from": "lineitem",
              "localField": "o_orderkey",
              "foreignField": "l_orderkey",
              "as": "lineitem",  
          }
      }, {
          "$unwind": "$lineitem"
      } ,  {
          "$lookup": {
              "from": "customer",
              "localField": "o_custkey",
              "foreignField": "c_custkey",
              "as": "customer",   
          }
      }, {
          "$unwind": "$customer"
      }, {
          "$lookup": {
              "from": "supplier",
              "localField": "customer.c_nationkey",
              "foreignField": "s_nationkey",
              "as": "supplier",
              
          }
      }, {
          "$unwind": "$supplier"
      }
      ]);
      db.orders.aggregate(pipeline).itcount();
      db.orders.aggregate(pipeline).explain().queryPlanner.winningPlan.queryPlan.cardinalityEstimate;
       

      The top-level $match matches no rows and therefore the cardinalityEstimate of all joins and the entire query is zero, while the query returns ~150 rows in reality.  

            Assignee:
            Unassigned
            Reporter:
            Philip Stoev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: