[SERVER-3892] consider multiple candidate index scans, potentially with intersection, in cases of distinct ranges on a multikey index Created: 16/Sep/11  Updated: 14/Apr/16  Resolved: 26/Jan/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kyle Banker Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-16042 Optimise $all/$and to select smallest... Closed
Related
related to SERVER-8790 Introduce composable "stages" in quer... Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

When an index is multikey and a query specifies two predicates on an indexed field, and the index bounds produced by one predicate do not entirely include the index bounds produced by the other predicate, the index bounds for only one predicate will be used for the query. Generally the index bounds for the first predicate parsed are chosen, while the index bounds for the other predicate are ignored.

This behavior was requested here <https://jira.mongodb.org/browse/SERVER-958?focusedCommentId=19122&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-19122>.

We could instead attempt to use the index bounds for both predicates to evaluate a query like this. For example we could create a separate query plan for each predicate's index bounds and analyze/run each plan in the query optimizer. Or we could create a plan with an index and over ranges for both index bounds.

In the example below, the predicate a:2 produces index bounds [[ 2, 2 ]] while the predicate a:{ $gt:5 } produces index bounds [[ 5, max_number ]]. Currently the first query only uses the index bounds [[ 2, 2 ]] only while the second query uses the index bounds [[ 5, max_number ]] only.

Aaron

--------------------------------------------------

db.foo.save({a: [1, 2, 3, 4, 5]})
db.foo.save({a: [1, 2, 3, 4, 5, 6]})
db.foo.ensureIndex({a: 1})
 
db.foo.find({$and: [{a: 2}, {a: {$gt: 5}}]}).explain()
db.foo.find({$and: [{a: {$gt: 5}, {a: 2}}]}).explain()

The first of these queries scans two objects, while the second scans only one. Is there a way to implement this that provides a more consistent result? If we had a million documents with {a: 2}, then the first query would scan a lot of documents.



 Comments   
Comment by David Storch [ 26/Jan/15 ]

This ticket describes a problem that still exists in the new query system written for 2.6. For the new query system, this issue is being tracked under SERVER-16042. Since SERVER-16042 ticket is newer and has more context, I am closing this one as a duplicate.

Generated at Thu Feb 08 03:04:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.