[SERVER-3071] Index Intersection Created: 09/May/11  Updated: 12/Jul/16  Resolved: 20/Dec/13

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: hari.khalsa@10gen.com
Resolution: Done Votes: 71
Labels: indexing, performance, query_triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-3414 consider using nscannedObjects in add... Closed
is depended on by SERVER-4572 Geospatial index cannot be used in $a... Closed
Duplicate
is duplicated by SERVER-7742 Improve the way indexes are used to d... Closed
is duplicated by SERVER-8753 Query optimizer doesn't use hashed ke... Closed
is duplicated by SERVER-3118 get best intersection match via query Closed
is duplicated by SERVER-8768 implement support for interleaving bt... Closed
is duplicated by SERVER-8140 Compound Index with 2d and Text Closed
Related
related to SERVER-1000 $all with query optimizer Closed
related to SERVER-9148 In text searches with $near filter, r... Closed
related to SERVER-8790 Introduce composable "stages" in quer... Closed
Participants:

 Description   

Update:

Index intersection is on by default. The query planner currently limits itself to 2 indices at a time, though this limit may increase in the future.

Original Text:

This should support both single and multiple index intersections.

This should support queries that which want to use a covered index; then complex intersections would never need to hit the document to return the values.

A more detailed description of index intersection in a traditional RDMS can be found here:

http://www.sql-server-performance.com/2003/index-intersection/



 Comments   
Comment by Githook User [ 04/Mar/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-3071 penalize fetches during ranking
Branch: master
https://github.com/mongodb/mongo/commit/33912523e40467e1a41bed60ed7e198200a040ad

Comment by Neville Dipale [ 04/Mar/14 ]

Thanks David, that clarifies a lot of things. For example, if I would have read somewhere whilst following this issue that the limit of 2 indices is because of arrays, I'd have understood. Certainly will be interesting to see how an increase in the indices works in future versions, especially if there's a good was of applying heuristics to the search space.

Thanks again, I for one really appreciate it!

Comment by David Storch [ 03/Mar/14 ]

Hi all,

Here are answers to some common questions that we are receiving about the index intersection feature, to be included in the upcoming v2.6 release. We will add to this list of Q&A as we hear from users.

Q: What is the maximum number of indices that can be intersected?
A: Only intersections between 2 indices are currently considered. Without this limitation, the optimizer may have to consider every possible combination of indices, which could be prohibitively large. (Alternately, the size of the search space could be heuristically pruned, but this will not be part of v2.6.)

The limitation of 2 index scans does not apply to "self-intersections". To understand "self-intersections", consider the common use case of indexing an array of attributes, e.g. attributes: ["electronics", "homeGoods", "laptops"]. Say we query for {attributes: {$all: ["electronics", "laptops"]}}. In previous versions of MongoDB, this would retrieve all documents with the index key "electronics", and then filter the result set by documents that also contain the tag "laptops". We cannot simply retrieve all documents with either the index key "electronics" or the index key "laptops" because then the result set could contain documents that contain one of the two tags but not both. With index intersection, we can now retrieve the documents with "electronics" and the documents with "laptops" using two separate index scans, and then fetch only the intersection of the two scans. This "self-intersection" can considerably reduce the number of complete documents that the query execution needs to retrieve. We currently cap the number of "self-intersection" scans at 10.

Q: How does the performance of index intersection compare to compound indices? When should I use a compound index, and when should I rely on index intersection?
A: Using a compound index is almost always faster than index intersection, and the query optimizer will not even consider an index intersection plan if a compound index is available over the same fields. Index intersection generally needs to scan more index keys than a compound index in order to find documents satisfying the relevant predicates. As such, index intersection should not be considered a replacement for compound indices. If a large percentage of your workload is to look up documents on fields 'a', 'b', and 'c', then it is probably worth building a compound index on these fields.

Relying on index intersection may make more sense if you have a more varied workload: sometimes you look up by (a, b), sometimes by (b, c), sometimes by (c, d), sometimes just by 'b', etc. While you certainly could solve this problem by building many compound indices, it may make more sense to build several single-field indices. Rather than predicting which pairs of fields you will query on, you can use index intersection determine these pairs in an ad hoc fashion.

Q: Will v2.6 support query + sort intersection?
A: Yes. One caveat is that the query must contain predicates over both indices. Consider a collection with indices on {zipcode:1} and {last_name:1}. We will not perform index intersection for db.phonebook.find({zipcode: "12345"}).sort({last_name: 1}). On the other hand, we will consider an index intersection plan for db.phonebook.find({zipcode: "12345", last_name: /^S/}).sort({last_name: 1}).

The reason the sort field is required in the query predicate is because without it we would have to scan the entire index in order to obtain the sort.

Q: When will the query optimizer select index intersection plans over single-index plans?
A: The query optimizer may select index intersection plans when the following conditions hold:

  1. Most of the documents in the relevant collection are disk-resident. The advantage of index intersection is that it can avoid fetching complete documents when the size of the intersection is small. If the documents are already in memory, there is nothing to gain by avoiding fetches.
  2. The query predicates are single point intervals, rather than range predicates or a set of intervals. Queries over single point intervals return documents sorted by disk location, which allows the optimizer to select plans that compute the intersection in a non-blocking fashion. This is generally faster than the alternative mode of computing the intersection, which is to build a hash table with the results from one index, and then probe it with the results from the second index.
  3. Neither of the indices to be intersected are highly selective. If one of the indices is selective then the optimizer will choose a plan which simply scans this selective index.
  4. The size of the intersection is small relative to the number of index keys scanned by either single-index solution. In this case the query executor can look at a smaller set of documents using index intersection, potentially allowing us to reap the benefits of fewer fetches from disk.
Comment by Jim Wang [ 03/Mar/14 ]

We are also looking for query + sort index intersection.
Currently we have to create lots of compound indexes to support query + sort.
It seems index intersection only applies to query for the coming 2.6 release.
Is there any plan to support query + sort in the future release?

Comment by Paul Konova [ 22/Feb/14 ]

So,

a) will this support query + sort index intersection?
b) what is the max number of intersected indexes per query + sort?
c) what is the performance difference between intersections and compound indexes?
d) can this be considered a replacement for compound indexes?

This would make my index optimization an order of magnitude easier if it works essentially as a replacement for large compound indexes.

Comment by Chad Kreimendahl [ 13/Jan/14 ]

Moving this from 1 to 2 indexes used at a time is essentially a non-change from usability. Those of us looking for intersections were hoping for a much larger set (8,16,32, etc).

Comment by Matt Parlane [ 05/Jan/14 ]

Quick question about this: will this feature allow me to simply have one index on every field I might ever need to filter on (obviously up to the 64 index limit), or is that too simplistic?

Comment by Chad Kreimendahl [ 18/Dec/13 ]

Excellent to see this going into the next major release. We'll be watching this and testing when it's out. Any chance this will work with filter & sorting at the same time?

Comment by Tobias Gassmann [ 04/Nov/13 ]

Is SERVER-4572 really depended on by this issue? Making use of the correct geospatial index is possible with a workaround in the query formulation: Using a comma to express a logical conjunction works fine, while using '$and' in the query will cause the bug as described in SERVER-4572.

Do ',' and '$and' make a difference when it comes to choosing the right index?

Comment by Tad Marshall [ 07/Mar/13 ]

2.5.w is just a scheduling mechanism to pull something to the front of the 2.5.x group. There are enough tickets marked for 2.5.x (meaning planned for some release within the 2.5/2.6 version cycle) that it is helpful to use this method to flag a subset of them. "w" comes before "x" in the alphabet, so it's just a sorting tool.

Comment by rgpublic [ 07/Mar/13 ]

Sorry for the spam but what exactly is the difference between 2.5.x and 2.5.w? I've searched all over the MongoDB and JIRA site and cannot find any information on this peculiar versioning scheme.

Comment by Nick Gerner [ 06/Nov/12 ]

any thoughts on if this is anywhere in the list of priorities?

Comment by Jeffrey Yemin [ 09/Dec/11 ]

That link is dead. I think this is the new location: http://www.sql-server-performance.com/2003/index-intersection/

Comment by Evan Cowden [ 02/Nov/11 ]

This would be an absolutely huge feature for us.

When we present one of our main collections to users, we would like to allow users to choose to search by one field and then sort by another. It would be wonderful if we could index each of the sort fields independently of the original search fields.

Comment by Scott Hernandez (Inactive) [ 23/May/11 ]

It would be nice if this supported all the features listed here: http://sql-server-2000.net/MS.Press-Microsoft.SQL.Server4/_index_intersection_236.htm

This would support a query for two (or more) different fields based on single value indexes.

Generated at Thu Feb 08 03:01:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.