-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
Fully Compatible
Filtering on arrays semantics vary across operators. Specifically,
[{$match: {a: {$gte: n, $lte: m}}}]
differs from
[{$match: {a: {$elemMatch: {$gte: n, $lte: m}}}}]
on two regards:
- The former considers both scalar and array values whereas `elemMatch` considers only arrays
- The former when filtering on arrays translates the condition to a conjunction of intervals, this leads to a mismatch in the output results of the two queries.
Example for (2), Collection containing the following documents:
{ "_id" : "R1", "a" : [ -100, -20] } { "_id" : "R2", "a" : [ 5, 7 ] } { "_id" : "R3", "a" : [ 100, 1000 ] } { "_id" : "R4", "a" : [ -20, 5 ] } { "_id" : "R5", "a" : [ 7, 100 ] } { "_id" : "R6", "a" : [ -100, 100 ] }
Query:
[{$match: {a: {$gte: 0, $lte: 10}}}]
Will split the condition to (-inf, 10] and [0, +inf). And will include in the output the documents "R2", "R4", "R5", and "R6"
Query:
[{$match: {a: {$elemMatch: {$gte: n, $lte: m}}}}]
Will include in the output only the documents "R2", "R4", "R5"
This semantic difference has an impact on the cardinality estimation algorithm. The MaxDiff histogram already implements different strategies to accommodate both cases.
This ticket introduces an additional query semantics input to HistogramCardinalityEstimator to allow the caller to decide which of the two cardinality estimation algorithms should the histogram estimator use for range filter cardinality estimation over arrays.
- fixes
-
SERVER-97105 Remove estimateRangeQueryOnArray() because interval semantic difference
- Closed
- is related to
-
SERVER-97105 Remove estimateRangeQueryOnArray() because interval semantic difference
- Closed