[SERVER-12281] When choosing multikey index bounds, never choose a superset if a subset is available Created: 07/Jan/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.5.4
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 11
Labels: asya
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-14519 $and $ne performance degradation in 2.6 Closed
is duplicated by SERVER-15214 $nin operator avoid query execution o... Closed
is duplicated by SERVER-15658 Query with $in and $nin doesn't use i... Closed
Related
related to SERVER-12470 document and audit multikey bounds be... Closed
related to SERVER-18364 Ensure non-negation predicates get ch... Closed
is related to SERVER-13273 multiple predicates for $all and mult... Closed
is related to SERVER-16042 Optimise $all/$and to select smallest... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

For queries with multiple predicates over a multikey field, you can't intersect the index bounds. Suppose there is a multikey index on field 'a'. The following query

db.t.find({a: {$lte: 0, $gte: 1}});

would have empty index bounds if 'a' were not a multikey field. Since it is multikey, however, we either use bounds

[-Infinity, 0]

or

[1, Infinity]

Which bounds to use is chosen arbitrarily. As an optimization, we could make sure that we always choose the smallest interval over which to perform the index scan. For instance, for the query

db.t.find({a: {$gt: 4, $gte: 5}});

we would choose

[5, Infinity]

rather than

(4, Infinity]

because the former is a subset of the latter.



 Comments   
Comment by David Storch [ 27/Oct/15 ]

As of the fix for SERVER-16042, the server no longer chooses the multikey index bounds arbitrarily. Instead, the server will generate alternative plans using each possible set of bounds, and rank these candidates against one another. If a particular set of bounds is significantly more selective, this should be discovered during the plan ranking process.

As an optimization, we could still consider suppressing generation of plans whose bounds are a superset of another plan's bounds. However, SERVER-16042 solves the immediate problem and makes this work much less urgent.

Comment by David Storch [ 11/May/15 ]

Hi tredman@fb.com,

An update on your question:

This bit us too. Query is in the form { foo: { $in: [ a ], $nin: [ b, c, ..., z ] } }. Basically results in a full index scan. Any chance of this getting back-ported to 2.6?

We have a fix for this query under related ticket SERVER-18364. The fix is currently in master, and will be available in 2.6.10 as well as 3.0.4. The RC process for 2.6.10 is set to begin within the next few weeks.

Best,
Dave

Comment by Asya Kamsky [ 09/May/15 ]

tredman@fb.com we can request a backport only once there is a fix committed to master.

Comment by Travis Redman [ 08/May/15 ]

This bit us too. Query is in the form { foo:

{ $in: [ a ], $nin: [ b, c, ..., z ] }

}. Basically results in a full index scan. Any chance of this getting back-ported to 2.6?

Comment by Sebastian Friedel [ 26/Jan/15 ]

I was bitten by this as well, query went from 7ms to 71661ms after upgrade to MongoDB 2.6.7 (from 2.4.12). Query and details in case anyone is interested: http://pastebin.com/Sca2c3qS

Comment by Thomas Boutell [ 14/Nov/14 ]

+1; I was surprised by the harsh slowdown imposed by the presence of $nin as well as $in.

Generated at Thu Feb 08 03:28:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.