-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
For example, consider a FETCH stage with the following filter:
{
$match: {
$and: [
{
$or: [
{ a: { $nin: [23565, 11, 9708, 4819] } },
{ a: false }
]
},
{
b: { $lt: 1.13 },
c: { $lte: 726 }
},
{
$or: [
{ d: { $type: 'string' } },
{ e: { $regex: '^y' } }
]
}
],
f: { $lte: 1.15 },
g: { $lte: { s: 1, b: 5 } }
}
}
The filter expression has 8 leaf expressions. SERVER-113632 calibrated the cost of evaluating a single leaf expression. A naive cost model could simply multiply that cost by 8 (the number of leaves). However, that will usually be a huge overestimation of the actual cost because of short circuiting.
When the first child expression of $and evaluates to false, none of the remaining child expressions need to be evaluated. Similarly for $or, once a child expression evaluates to true, none of the other child expressions need to be looked at. The more complex the filter, the unlikelier it becomes that each leaf expression will be evaluated.
The current costing logic only assigns the cost of 1 leaf expression even if there are more, because it turns out better plan choices are made when underestimating the filter cost than when overestimating.
However, there are many possible ways in which we can make a smarter estimation of the actual leaf evaluations. Since we have a selectivity estimate for each predicate, we basically just have to solve the probability problem to find E(number of leaves evaluated) where E is the expected value. We of course don't have perfect information, because we can't perfectly estimate selectivities for conjuncts or disjuncts.
- is related to
-
SERVER-113632 Add incremental cost for every leaf in the filter expression
-
- Closed
-