-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
samplingCE will return an estimate of zero when a given predicate matches no rows in the sample. This however is not a guarantee that the collection does not contain any matching rows at all.
A cardinalityEstimate of zero causes the cost to also go down to zero, meaning that this plan will always be picked, even if it is not truly the best plan.
CBR is also suffering from the same problem, but in join optimization there is a potential for the error to compound.
To reproduce:
pipeline = EJSON.deserialize([{
"$match": {
"$and": [{
"o_clerk": {
"$eq": "Clerk#000000849"
}
}, {
"o_orderdate": {
"$gt": {$date: "1992-06-03T00:00:00.000Z"},
"$lt": {$date: "1993-12-19T00:00:00.000Z"}
}
}]
}
}, {
"$lookup": {
"from": "lineitem",
"localField": "o_orderkey",
"foreignField": "l_orderkey",
"as": "lineitem",
}
}, {
"$unwind": "$lineitem"
} , {
"$lookup": {
"from": "customer",
"localField": "o_custkey",
"foreignField": "c_custkey",
"as": "customer",
}
}, {
"$unwind": "$customer"
}, {
"$lookup": {
"from": "supplier",
"localField": "customer.c_nationkey",
"foreignField": "s_nationkey",
"as": "supplier",
}
}, {
"$unwind": "$supplier"
}
]);
db.orders.aggregate(pipeline).itcount();
db.orders.aggregate(pipeline).explain().queryPlanner.winningPlan.queryPlan.cardinalityEstimate;
The top-level $match matches no rows and therefore the cardinalityEstimate of all joins and the entire query is zero, while the query returns ~150 rows in reality.
- is related to
-
SERVER-122265 [Join Optimization] Costing of INLJ ignores cost of sorted-sparse I/O to fetch records from collection
-
- Closed
-
- related to
-
SERVER-123070 CBR: Random/sub-optimal index picked in the face of cardinality estimate of zero
-
- Closed
-