-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
Currently $exists (and possibly other predicates) is estimated via heuristic CE when represented as a MatchExpression. Any leaf predicate different from comparison predicates, And, Or, Not is estimated via
CardinalityEstimator::estimate(const LeafMatchExpression* node)
which is in turn implemented as heuristic CE.
However, given an index on the $exists field, an interval is generated, which in turn ican be estimated via histogram if one is available. Example:
coll.find({missing_50_percent: {$exists: false}}).explain();
The plan is:
winningPlan: { isCached: false, stage: 'FETCH', costEstimate: 5.2621663000000005, cardinalityEstimate: 3000, estimatesMetadata: { ceSource: 'Histogram' }, filter: { missing_50_percent: { '$not': { '$exists': true } } }, inputStage: { stage: 'IXSCAN', costEstimate: 1.225155, cardinalityEstimate: 3000, numKeysEstimate: 3000, estimatesMetadata: { ceSource: 'Histogram' }, keyPattern: { missing_50_percent: 1 }, indexName: 'missing_50_percent_1', isMultiKey: false, multiKeyPaths: { missing_50_percent: [] }, isUnique: false, isSparse: false, isPartial: false, indexVersion: 2, direction: 'forward', indexBounds: { missing_50_percent: [ '[null, null]' ] } } },
While if we drop the index, we use heuristics:
coll.dropIndex({missing_50_percent:1});
coll.find({missing_50_percent: {$exists: false}}).explain();
The plan is:
winningPlan: { isCached: false, stage: 'COLLSCAN', costEstimate: 3.5472981, cardinalityEstimate: 60, numDocsEstimate: 6000, estimatesMetadata: { ceSource: 'Heuristics' }, filter: { missing_50_percent: { '$not': { '$exists': true } } }, direction: 'forward' },
Ideally this ticket should consider all predicates that can be expressed as an interval, and therefore estimated via a histogram.