Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3892

consider multiple candidate index scans, potentially with intersection, in cases of distinct ranges on a multikey index

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Querying
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible

      Description

      When an index is multikey and a query specifies two predicates on an indexed field, and the index bounds produced by one predicate do not entirely include the index bounds produced by the other predicate, the index bounds for only one predicate will be used for the query. Generally the index bounds for the first predicate parsed are chosen, while the index bounds for the other predicate are ignored.

      This behavior was requested here <https://jira.mongodb.org/browse/SERVER-958?focusedCommentId=19122&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-19122>.

      We could instead attempt to use the index bounds for both predicates to evaluate a query like this. For example we could create a separate query plan for each predicate's index bounds and analyze/run each plan in the query optimizer. Or we could create a plan with an index and over ranges for both index bounds.

      In the example below, the predicate a:2 produces index bounds [[ 2, 2 ]] while the predicate a:{ $gt:5 } produces index bounds [[ 5, max_number ]]. Currently the first query only uses the index bounds [[ 2, 2 ]] only while the second query uses the index bounds [[ 5, max_number ]] only.

      Aaron

      --------------------------------------------------

      db.foo.save({a: [1, 2, 3, 4, 5]})
      db.foo.save({a: [1, 2, 3, 4, 5, 6]})
      db.foo.ensureIndex({a: 1})
       
      db.foo.find({$and: [{a: 2}, {a: {$gt: 5}}]}).explain()
      db.foo.find({$and: [{a: {$gt: 5}, {a: 2}}]}).explain()

      The first of these queries scans two objects, while the second scans only one. Is there a way to implement this that provides a more consistent result? If we had a million documents with {a: 2}, then the first query would scan a lot of documents.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: