Details

      Description

      Update:

      Index intersection is on by default. The query planner currently limits itself to 2 indices at a time, though this limit may increase in the future.

      Original Text:

      This should support both single and multiple index intersections.

      This should support queries that which want to use a covered index; then complex intersections would never need to hit the document to return the values.

      A more detailed description of index intersection in a traditional RDMS can be found here:

      http://www.sql-server-performance.com/2003/index-intersection/

        Issue Links

          Activity

          Hide
          paulkon Paul Konova added a comment - - edited

          So,

          a) will this support query + sort index intersection?
          b) what is the max number of intersected indexes per query + sort?
          c) what is the performance difference between intersections and compound indexes?
          d) can this be considered a replacement for compound indexes?

          This would make my index optimization an order of magnitude easier if it works essentially as a replacement for large compound indexes.

          Show
          paulkon Paul Konova added a comment - - edited So, a) will this support query + sort index intersection? b) what is the max number of intersected indexes per query + sort? c) what is the performance difference between intersections and compound indexes? d) can this be considered a replacement for compound indexes? This would make my index optimization an order of magnitude easier if it works essentially as a replacement for large compound indexes.
          Hide
          jimwang Jim Wang added a comment -

          We are also looking for query + sort index intersection.
          Currently we have to create lots of compound indexes to support query + sort.
          It seems index intersection only applies to query for the coming 2.6 release.
          Is there any plan to support query + sort in the future release?

          Show
          jimwang Jim Wang added a comment - We are also looking for query + sort index intersection. Currently we have to create lots of compound indexes to support query + sort. It seems index intersection only applies to query for the coming 2.6 release. Is there any plan to support query + sort in the future release?
          Hide
          david.storch David Storch added a comment - - edited

          Hi all,

          Here are answers to some common questions that we are receiving about the index intersection feature, to be included in the upcoming v2.6 release. We will add to this list of Q&A as we hear from users.

          Q: What is the maximum number of indices that can be intersected?
          A: Only intersections between 2 indices are currently considered. Without this limitation, the optimizer may have to consider every possible combination of indices, which could be prohibitively large. (Alternately, the size of the search space could be heuristically pruned, but this will not be part of v2.6.)

          The limitation of 2 index scans does not apply to "self-intersections". To understand "self-intersections", consider the common use case of indexing an array of attributes, e.g. attributes: ["electronics", "homeGoods", "laptops"]. Say we query for {attributes: {$all: ["electronics", "laptops"]}}. In previous versions of MongoDB, this would retrieve all documents with the index key "electronics", and then filter the result set by documents that also contain the tag "laptops". We cannot simply retrieve all documents with either the index key "electronics" or the index key "laptops" because then the result set could contain documents that contain one of the two tags but not both. With index intersection, we can now retrieve the documents with "electronics" and the documents with "laptops" using two separate index scans, and then fetch only the intersection of the two scans. This "self-intersection" can considerably reduce the number of complete documents that the query execution needs to retrieve. We currently cap the number of "self-intersection" scans at 10.

          Q: How does the performance of index intersection compare to compound indices? When should I use a compound index, and when should I rely on index intersection?
          A: Using a compound index is almost always faster than index intersection, and the query optimizer will not even consider an index intersection plan if a compound index is available over the same fields. Index intersection generally needs to scan more index keys than a compound index in order to find documents satisfying the relevant predicates. As such, index intersection should not be considered a replacement for compound indices. If a large percentage of your workload is to look up documents on fields 'a', 'b', and 'c', then it is probably worth building a compound index on these fields.

          Relying on index intersection may make more sense if you have a more varied workload: sometimes you look up by (a, b), sometimes by (b, c), sometimes by (c, d), sometimes just by 'b', etc. While you certainly could solve this problem by building many compound indices, it may make more sense to build several single-field indices. Rather than predicting which pairs of fields you will query on, you can use index intersection determine these pairs in an ad hoc fashion.

          Q: Will v2.6 support query + sort intersection?
          A: Yes. One caveat is that the query must contain predicates over both indices. Consider a collection with indices on {zipcode:1} and {last_name:1}. We will not perform index intersection for db.phonebook.find({zipcode: "12345"}).sort({last_name: 1}). On the other hand, we will consider an index intersection plan for db.phonebook.find({zipcode: "12345", last_name: /^S/}).sort({last_name: 1}).

          The reason the sort field is required in the query predicate is because without it we would have to scan the entire index in order to obtain the sort.

          Q: When will the query optimizer select index intersection plans over single-index plans?
          A: The query optimizer may select index intersection plans when the following conditions hold:

          1. Most of the documents in the relevant collection are disk-resident. The advantage of index intersection is that it can avoid fetching complete documents when the size of the intersection is small. If the documents are already in memory, there is nothing to gain by avoiding fetches.
          2. The query predicates are single point intervals, rather than range predicates or a set of intervals. Queries over single point intervals return documents sorted by disk location, which allows the optimizer to select plans that compute the intersection in a non-blocking fashion. This is generally faster than the alternative mode of computing the intersection, which is to build a hash table with the results from one index, and then probe it with the results from the second index.
          3. Neither of the indices to be intersected are highly selective. If one of the indices is selective then the optimizer will choose a plan which simply scans this selective index.
          4. The size of the intersection is small relative to the number of index keys scanned by either single-index solution. In this case the query executor can look at a smaller set of documents using index intersection, potentially allowing us to reap the benefits of fewer fetches from disk.
          Show
          david.storch David Storch added a comment - - edited Hi all, Here are answers to some common questions that we are receiving about the index intersection feature, to be included in the upcoming v2.6 release. We will add to this list of Q&A as we hear from users. Q: What is the maximum number of indices that can be intersected? A: Only intersections between 2 indices are currently considered. Without this limitation, the optimizer may have to consider every possible combination of indices, which could be prohibitively large. (Alternately, the size of the search space could be heuristically pruned, but this will not be part of v2.6.) The limitation of 2 index scans does not apply to "self-intersections". To understand "self-intersections", consider the common use case of indexing an array of attributes, e.g. attributes: ["electronics", "homeGoods", "laptops"]. Say we query for {attributes: {$all: ["electronics", "laptops"]}}. In previous versions of MongoDB, this would retrieve all documents with the index key "electronics", and then filter the result set by documents that also contain the tag "laptops". We cannot simply retrieve all documents with either the index key "electronics" or the index key "laptops" because then the result set could contain documents that contain one of the two tags but not both. With index intersection, we can now retrieve the documents with "electronics" and the documents with "laptops" using two separate index scans, and then fetch only the intersection of the two scans. This "self-intersection" can considerably reduce the number of complete documents that the query execution needs to retrieve. We currently cap the number of "self-intersection" scans at 10. Q: How does the performance of index intersection compare to compound indices? When should I use a compound index, and when should I rely on index intersection? A: Using a compound index is almost always faster than index intersection, and the query optimizer will not even consider an index intersection plan if a compound index is available over the same fields. Index intersection generally needs to scan more index keys than a compound index in order to find documents satisfying the relevant predicates. As such, index intersection should not be considered a replacement for compound indices. If a large percentage of your workload is to look up documents on fields 'a', 'b', and 'c', then it is probably worth building a compound index on these fields. Relying on index intersection may make more sense if you have a more varied workload: sometimes you look up by (a, b), sometimes by (b, c), sometimes by (c, d), sometimes just by 'b', etc. While you certainly could solve this problem by building many compound indices, it may make more sense to build several single-field indices. Rather than predicting which pairs of fields you will query on, you can use index intersection determine these pairs in an ad hoc fashion. Q: Will v2.6 support query + sort intersection? A: Yes. One caveat is that the query must contain predicates over both indices. Consider a collection with indices on {zipcode:1} and {last_name:1}. We will not perform index intersection for db.phonebook.find({zipcode: "12345"}).sort({last_name: 1}). On the other hand, we will consider an index intersection plan for db.phonebook.find({zipcode: "12345", last_name: /^S/}).sort({last_name: 1}). The reason the sort field is required in the query predicate is because without it we would have to scan the entire index in order to obtain the sort. Q: When will the query optimizer select index intersection plans over single-index plans? A: The query optimizer may select index intersection plans when the following conditions hold: Most of the documents in the relevant collection are disk-resident. The advantage of index intersection is that it can avoid fetching complete documents when the size of the intersection is small. If the documents are already in memory, there is nothing to gain by avoiding fetches. The query predicates are single point intervals, rather than range predicates or a set of intervals. Queries over single point intervals return documents sorted by disk location, which allows the optimizer to select plans that compute the intersection in a non-blocking fashion. This is generally faster than the alternative mode of computing the intersection, which is to build a hash table with the results from one index, and then probe it with the results from the second index. Neither of the indices to be intersected are highly selective. If one of the indices is selective then the optimizer will choose a plan which simply scans this selective index. The size of the intersection is small relative to the number of index keys scanned by either single-index solution. In this case the query executor can look at a smaller set of documents using index intersection, potentially allowing us to reap the benefits of fewer fetches from disk.
          Hide
          nevi_me Neville Dipale added a comment -

          Thanks David, that clarifies a lot of things. For example, if I would have read somewhere whilst following this issue that the limit of 2 indices is because of arrays, I'd have understood. Certainly will be interesting to see how an increase in the indices works in future versions, especially if there's a good was of applying heuristics to the search space.

          Thanks again, I for one really appreciate it!

          Show
          nevi_me Neville Dipale added a comment - Thanks David, that clarifies a lot of things. For example, if I would have read somewhere whilst following this issue that the limit of 2 indices is because of arrays, I'd have understood. Certainly will be interesting to see how an increase in the indices works in future versions, especially if there's a good was of applying heuristics to the search space. Thanks again, I for one really appreciate it!
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-3071 penalize fetches during ranking
          Branch: master
          https://github.com/mongodb/mongo/commit/33912523e40467e1a41bed60ed7e198200a040ad

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-3071 penalize fetches during ranking Branch: master https://github.com/mongodb/mongo/commit/33912523e40467e1a41bed60ed7e198200a040ad

            People

            • Votes:
              71 Vote for this issue
              Watchers:
              75 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 22 weeks ago
                Date of 1st Reply: