Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13732

Predicates in top-level implicit AND query not considered when generating index access plan for contained OR

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 3.5.4
    • Component/s: Querying
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Quint Iteration 3, Quint Iteration 4, Quint Iteration 5, Quint Iteration 6, Query 2017-01-23, Query 2017-02-13, Query 2017-03-06

      Description

      Issue Status as of Feb 28, 2017

      ISSUE SUMMARY

      A "contained $or" is a query predicate which has an explicit or implicit $and at the top level where one of the clauses of the $and is a $or. For instance, the following two predicates are contained $or queries:

      {$and: [{a: 1}, {$or: [{b: 1}, {b: 2}]}, {c: 3}]}
      {a: 1, $or: [{b: 1}, {b: 2}], c: 3}
      

      This issue affects contained $or queries. If the query is answered by indexing the contained $or, versions 2.4.x and earlier of the server could also use the predicates outside the $or to constrain the index bounds. Suppose that a collection has two compound indexes {b: 1, a: 1} and {c: 1, a: 1}. Now consider a query with predicate

      {a: {$gte: 3, $lte: 6}, $or: [{b: 1}, {c: 2}]}
      

      In versions 2.4.x and earlier, this query is answered by two index scans:

      1. First scanning index {b: 1, a: 1} for keys where b is equal to 1 and a is on the interval [3, 6], and then
      2. scanning index {c: 1, a: 1} for keys where c is equal to 2 and a is on the interval [3, 6].

      However, on versions of the server affected by this issue, both index scans will have unconstrained bounds on field a and may examine keys where the value of a is outside the interval [3, 6].

      Versions 2.4.x and earlier of the server could also use the predicates outside the $or to provide index bounds for the first field in the index. Consider the same query on a collection with two compound indexes {a: 1, b: 1} and {a: 1, c: 1}. In versions 2.4.x and earlier, this query could be answered by two index scans:

      1. First scanning index {a: 1, b: 1} for keys where a is on the interval [3, 6] and b is equal to 1, and then
      2. scanning index {a: 1, c: 1} for keys where a is on the interval [3, 6] and c is equal to 2.

      On versions of the server affected by this issue, this query will always be answered by a single index scan on either {a: 1, b: 1} or {a: 1, c: 1} for keys where a is on the interval [3, 6].

      USER IMPACT

      Contained $or queries indexed in the manner described above will have looser bounds than they did on previous versions. Depending on the data in the collection, this could mean that the number of index keys examined in order to answer the query will be much higher, causing bad query performance.

      WORKAROUNDS

      Applications can work around the issue by rewriting contained $or queries as rooted $or queries. A "rooted $or" is a query predicate consisting of a single $or statement. For example, the following query predicate is a rooted $or:

      {$or: [{a: 1, b: 1}, {a: 2, b: 2}]}
      

      This rewrite involves distributing any predicates outside the contained $or into each $or clause. For example:

      {a: 1, b: 1, $or: [{c: 1}, {d: 1}]}
      

      could become the following equivalent query:

      {$or: [{a: 1, b:1, c: 1}, {a: 1, b: 1, d: 1}]}
      

      AFFECTED VERSIONS

      MongoDB 2.6.0 through 3.5.3

      FIX VERSION
      The fix is included in the 3.5.4 development release.

      Original description

      Certain types of queries containing $or clauses no longer make efficient use of indexes. This is a regression from 2.4 => 2.6.

      Given a collection with indexes {a:1, b:1} and {a:1, c:1}, consider the following query:

      db.collection.find({a:1, $or: [{b:1}, {c:1}]})
      

      The 2.4 query planner considers a union of an scan on index {a:1, b:1} with a scan on index {a:1, c:1}.

      The 2.6 query planner does not consider this plan; it selects a plan that scans the whole {a:1} range for one of the two indexes.

      There is an available workaround for this issue, which is to rewrite the query as a rooted $or (e.g. {$or: [{a:1, b:1}, {a:1, c:1}]}). For these queries, the 2.6 query planner correctly considers a plan that performs a union of the separate index scans.

        Issue Links

          Activity

          Hide
          rassi J Rassi (Inactive) added a comment -

          Further detail follows.

          In version 2.4 and earlier: for specific classes of queries containing $or clauses, the query planner would consider a plan that unions index scans on separate indexes. Queries that qualified include:

          1. queries rooted at an $or
            • e.g. for the query {$or: [{b:1}, {c:1}]}, the planner would consider a union of a scan on index {b:1} with a scan on index {c:1}
          2. queries rooted at an implicit $and that have an $or child
            • e.g. for the query {a:1, $or: [{b:1}, {c:1}]}, the planner would additionally consider a union of a scan on index {a:1, b:1} with a scan on index {a:1, c:1}

          For all other queries containing an $or clause, the 2.4 query planner would not consider assigning indexes to the $or child predicates.

          In version 2.6.0: the query planner always considers an indexed plan for $or, but the indexes chosen for the union are only based on the predicates contained in the $or children. That is, when planning the query {a:1, $or: [{b:1}, {c:1}]}, the 2.6 query planner will not consider the {a:1} predicate when creating a plan that takes a union of separate indexes. Employing the workaround (that is, rewriting the query as {$or: [{a:1, b:1}, {a:1, c:1}]}) allows both predicates to be used.

          Show
          rassi J Rassi (Inactive) added a comment - Further detail follows. In version 2.4 and earlier: for specific classes of queries containing $or clauses, the query planner would consider a plan that unions index scans on separate indexes. Queries that qualified include: queries rooted at an $or e.g. for the query {$or: [{b:1}, {c:1}]}, the planner would consider a union of a scan on index {b:1} with a scan on index {c:1} queries rooted at an implicit $and that have an $or child e.g. for the query {a:1, $or: [{b:1}, {c:1}]}, the planner would additionally consider a union of a scan on index {a:1, b:1} with a scan on index {a:1, c:1} For all other queries containing an $or clause, the 2.4 query planner would not consider assigning indexes to the $or child predicates. In version 2.6.0: the query planner always considers an indexed plan for $or, but the indexes chosen for the union are only based on the predicates contained in the $or children. That is, when planning the query {a:1, $or: [{b:1}, {c:1}]}, the 2.6 query planner will not consider the {a:1} predicate when creating a plan that takes a union of separate indexes. Employing the workaround (that is, rewriting the query as {$or: [{a:1, b:1}, {a:1, c:1}]}) allows both predicates to be used.
          Hide
          david.storch David Storch added a comment -

          Hi all,

          A quick status update on this ticket follows.

          We have explored two options for solving this issue on the 2.6 branch:

          • Introduce a logical rewrite that rearranges the abstract syntax tree into a rooted-$or.
          • Alter our exploration of the plan space so that it is capable of enumerating the missed access plans.

          After technical review, both of these solutions were deemed to be a step in the wrong direction. The system does not currently have a rewrite engine. Doing this particular logical rewrite breaks some internal assumptions (for instance, the update system's logic for computing the document to insert on {upsert: true} updates depends on the absence of any logical rewrites). Furthermore, this change would improve the plans for one class of queries at the expense of another class.

          Regarding the plan space exploration solution: moving forward we hope to alter our exploration strategy in order to be more flexible about which plans we consider and which we don't. Using the current strategy for exploring the plan space, this change is technically difficult if not infeasible.

          For the reasons listed above, backport to the 2.6 branch is currently not planned.

          Best,
          Dave

          Show
          david.storch David Storch added a comment - Hi all, A quick status update on this ticket follows. We have explored two options for solving this issue on the 2.6 branch: Introduce a logical rewrite that rearranges the abstract syntax tree into a rooted-$or. Alter our exploration of the plan space so that it is capable of enumerating the missed access plans. After technical review, both of these solutions were deemed to be a step in the wrong direction. The system does not currently have a rewrite engine. Doing this particular logical rewrite breaks some internal assumptions (for instance, the update system's logic for computing the document to insert on {upsert: true} updates depends on the absence of any logical rewrites). Furthermore, this change would improve the plans for one class of queries at the expense of another class. Regarding the plan space exploration solution: moving forward we hope to alter our exploration strategy in order to be more flexible about which plans we consider and which we don't. Using the current strategy for exploring the plan space, this change is technically difficult if not infeasible. For the reasons listed above, backport to the 2.6 branch is currently not planned. Best, Dave
          Hide
          urs.enke Urs Enke added a comment - - edited

          A small note: Queries like

          {a:1, $or:[{b:1},{b:2}]}

          could be rewritten as (or executed like)

          {a:1, b:{$in:[1,2]}}

          , which may be more efficient than changing anything at the root. One could argue that in such cases it's the user's job to use $in, but (s)he may expect $or to be an efficient choice as well given that it is when used at the root.

          Show
          urs.enke Urs Enke added a comment - - edited A small note: Queries like {a:1, $or:[{b:1},{b:2}]} could be rewritten as (or executed like) {a:1, b:{$in:[1,2]}} , which may be more efficient than changing anything at the root. One could argue that in such cases it's the user's job to use $in, but (s)he may expect $or to be an efficient choice as well given that it is when used at the root.
          Hide
          asya Asya Kamsky added a comment -

          I believe you mean {a:1,$or:[{b:1},{b:2}]} correct?

          The issue is when the two queries within $or do not test the same field - then they cannot be rewritten using $in.

          Show
          asya Asya Kamsky added a comment - I believe you mean { a:1,$or:[{b:1},{b:2}] } correct? The issue is when the two queries within $or do not test the same field - then they cannot be rewritten using $in.
          Hide
          urs.enke Urs Enke added a comment -

          You're right – I had mistyped.

          Of course, $in is only an alternative when the $or concerns a single field (or when all combinations of the affected fields' values are allowed, so each field could get its own $in without changing the meaning). But whoever is used to MySQL's comparable performance of "b=1 OR b=2" and "b IN (1,2)" could easily assume a suitably simply structured $or to be optimized to use an index, despite the posibility of using the operator in a way that would be harder to optimize.

          Show
          urs.enke Urs Enke added a comment - You're right – I had mistyped. Of course, $in is only an alternative when the $or concerns a single field (or when all combinations of the affected fields' values are allowed, so each field could get its own $in without changing the meaning). But whoever is used to MySQL's comparable performance of "b=1 OR b=2" and "b IN (1,2)" could easily assume a suitably simply structured $or to be optimized to use an index, despite the posibility of using the operator in a way that would be harder to optimize.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-13732 rewrite contained $or queries to rooted $or in the SubplanStage

          This allows queries with an $or contained within an explicit or implicit $and to be answered with
          more efficient plans. It also expands the use of the SubplanStage to include contained $or queries
          and therefore may reduce the number of plans considered for these queries.
          Branch: master
          https://github.com/mongodb/mongo/commit/15c72c8570c63e2e6ba0a3d339a8286d0be604db

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-13732 rewrite contained $or queries to rooted $or in the SubplanStage This allows queries with an $or contained within an explicit or implicit $and to be answered with more efficient plans. It also expands the use of the SubplanStage to include contained $or queries and therefore may reduce the number of plans considered for these queries. Branch: master https://github.com/mongodb/mongo/commit/15c72c8570c63e2e6ba0a3d339a8286d0be604db
          Hide
          david.storch David Storch added a comment -

          Hi all,

          After reviewing the fix for this problem included in development release 3.1.6, we have decided that its risks outweigh its benefits. Unfortunately the fix results in a substantial performance regression in the query planner and introduces some fragile logic to the $or query execution machinery. Furthermore, the problem has a relatively straightforward application-level workaround. For these reasons, we are hesitant to include the fix in a stable release. We are disabling the codepath implementing the fix, re-opening this ticket, and refraining from pushing a backport to the 3.0.x stable release series. (Note that although the fix is disabled, we are leaving all the plumbing in place so that enabling in a future release is trivial, should the fix prove to be safe and the performance regression prove to have a minimal impact.)

          Our apologies for not fixing this. Please feel free to reach out to me on this ticket with questions or concerns.

          Best,
          Dave

          Show
          david.storch David Storch added a comment - Hi all, After reviewing the fix for this problem included in development release 3.1.6, we have decided that its risks outweigh its benefits. Unfortunately the fix results in a substantial performance regression in the query planner and introduces some fragile logic to the $or query execution machinery. Furthermore, the problem has a relatively straightforward application-level workaround. For these reasons, we are hesitant to include the fix in a stable release. We are disabling the codepath implementing the fix, re-opening this ticket, and refraining from pushing a backport to the 3.0.x stable release series. (Note that although the fix is disabled, we are leaving all the plumbing in place so that enabling in a future release is trivial, should the fix prove to be safe and the performance regression prove to have a minimal impact.) Our apologies for not fixing this. Please feel free to reach out to me on this ticket with questions or concerns. Best, Dave
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

          Message: SERVER-13732 disable subplanning for contained $or queries
          Branch: master
          https://github.com/mongodb/mongo/commit/8fde29a7afd0c4fd5cc84ff4f03217f3efe224e1

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'} Message: SERVER-13732 disable subplanning for contained $or queries Branch: master https://github.com/mongodb/mongo/commit/8fde29a7afd0c4fd5cc84ff4f03217f3efe224e1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

          Message: SERVER-13732 Index access plan for contained OR should consider top-level predicates
          Branch: master
          https://github.com/mongodb/mongo/commit/f77527a942347313e2848e050e89480bc3cadb95

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'} Message: SERVER-13732 Index access plan for contained OR should consider top-level predicates Branch: master https://github.com/mongodb/mongo/commit/f77527a942347313e2848e050e89480bc3cadb95

            People

            • Votes:
              18 Vote for this issue
              Watchers:
              46 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile