For queries with multiple predicates over a multikey field, you can't intersect the index bounds. Suppose there is a multikey index on field 'a'. The following query
db.t.find({a: {$lte: 0, $gte: 1}});
would have empty index bounds if 'a' were not a multikey field. Since it is multikey, however, we either use bounds
[-Infinity, 0]
or
[1, Infinity]
The implementation currently chooses between the possible bounds by selecting one of the predicates over the indexed field. In particular, it just chooses the first predicate.
The parsed query tree is always sorted according to predicate type, which means that sometimes we build multikey bounds using negation predicates rather than non-negation predicates:
db.t.insert({a: [1, 2, 3]}); db.t.ensureIndex({a: 1}); db.t.find({a: {$in: [1], $nin: [2]}}).explain().queryPlanner.winningPlan { ... "indexBounds" : { "a" : [ "[MinKey, 2.0)", "(2.0, MaxKey]" ] } ... }
The explain output above shows that we selected bounds using the $nin predicate rather than the $in. Although we don't definitively know which bounds are the most selective because the system does not keep stats/histograms describing the data, it is generally a good heuristic to choose bounds from non-negation predicates over bounds from negation predicates. (Bounds from negation predicates tend to be wider.)
We should ensure that we always choose non-negation bounds over negation bounds by sorting non-negation predicates before negations in the parsed query tree.
- is related to
-
SERVER-12281 When choosing multikey index bounds, never choose a superset if a subset is available
- Backlog
-
SERVER-16042 Optimise $all/$and to select smallest subset as initial index bounds
- Closed