[SERVER-13732] Predicates in top-level implicit AND query not considered when generating index access plan for contained OR Created: 25/Apr/14  Updated: 27/Sep/19  Resolved: 28/Feb/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0
Fix Version/s: 3.5.4

Type: Bug Priority: Critical - P2
Reporter: J Rassi Assignee: Tess Avitabile (Inactive)
Resolution: Done Votes: 18
Labels: 2426, todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-23251 Optimizer can not use whole index fie... Closed
is duplicated by SERVER-24209 Limit() phase does not appear to work... Closed
is duplicated by SERVER-25442 OR query does not use middle field of... Closed
is duplicated by SERVER-31929 Inefficient plan for $or query Closed
is duplicated by SERVER-14130 Query planner could consider addition... Closed
Related
related to SERVER-36393 Contained $or pushdown optimization c... Backlog
related to SERVER-32441 3.6 mongod crash on find with index a... Closed
related to SERVER-15012 Server crashes on indexed rooted $or ... Closed
related to SERVER-13104 Plan enumerator doesn't enumerate all... Closed
related to SERVER-14740 Query rewrite of special $or leaf cas... Closed
related to SERVER-27904 Extend support for moving predicates ... Closed
related to SERVER-42558 Complete TODO listed in SERVER-13732 Closed
related to SERVER-18777 CachedPlanStage replanning mechanism ... Backlog
is related to SERVER-13184 2.6 should explore same number of one... Closed
is related to SERVER-15208 Inefficient index bounds applied to $... Closed
is related to SERVER-19388 assertion in sort.cpp Closed
is related to SERVER-31280 Implementing paging using range queri... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Quint Iteration 3, Quint Iteration 4, Quint Iteration 5, Quint Iteration 6, Query 2017-01-23, Query 2017-02-13, Query 2017-03-06
Participants:
Case:

 Description   
Issue Status as of Feb 28, 2017

ISSUE SUMMARY

A "contained $or" is a query predicate which has an explicit or implicit $and at the top level where one of the clauses of the $and is a $or. For instance, the following two predicates are contained $or queries:

{$and: [{a: 1}, {$or: [{b: 1}, {b: 2}]}, {c: 3}]}
{a: 1, $or: [{b: 1}, {b: 2}], c: 3}

This issue affects contained $or queries. If the query is answered by indexing the contained $or, versions 2.4.x and earlier of the server could also use the predicates outside the $or to constrain the index bounds. Suppose that a collection has two compound indexes {b: 1, a: 1} and {c: 1, a: 1}. Now consider a query with predicate

{a: {$gte: 3, $lte: 6}, $or: [{b: 1}, {c: 2}]}

In versions 2.4.x and earlier, this query is answered by two index scans:

  1. First scanning index {b: 1, a: 1} for keys where b is equal to 1 and a is on the interval [3, 6], and then
  2. scanning index {c: 1, a: 1} for keys where c is equal to 2 and a is on the interval [3, 6].

However, on versions of the server affected by this issue, both index scans will have unconstrained bounds on field a and may examine keys where the value of a is outside the interval [3, 6].

Versions 2.4.x and earlier of the server could also use the predicates outside the $or to provide index bounds for the first field in the index. Consider the same query on a collection with two compound indexes {a: 1, b: 1} and {a: 1, c: 1}. In versions 2.4.x and earlier, this query could be answered by two index scans:

  1. First scanning index {a: 1, b: 1} for keys where a is on the interval [3, 6] and b is equal to 1, and then
  2. scanning index {a: 1, c: 1} for keys where a is on the interval [3, 6] and c is equal to 2.

On versions of the server affected by this issue, this query will always be answered by a single index scan on either {a: 1, b: 1} or {a: 1, c: 1} for keys where a is on the interval [3, 6].

USER IMPACT

Contained $or queries indexed in the manner described above will have looser bounds than they did on previous versions. Depending on the data in the collection, this could mean that the number of index keys examined in order to answer the query will be much higher, causing bad query performance.

WORKAROUNDS

Applications can work around the issue by rewriting contained $or queries as rooted $or queries. A "rooted $or" is a query predicate consisting of a single $or statement. For example, the following query predicate is a rooted $or:

{$or: [{a: 1, b: 1}, {a: 2, b: 2}]}

This rewrite involves distributing any predicates outside the contained $or into each $or clause. For example:

{a: 1, b: 1, $or: [{c: 1}, {d: 1}]}

could become the following equivalent query:

{$or: [{a: 1, b:1, c: 1}, {a: 1, b: 1, d: 1}]}

AFFECTED VERSIONS

MongoDB 2.6.0 through 3.5.3

FIX VERSION
The fix is included in the 3.5.4 development release.

Original description

Certain types of queries containing $or clauses no longer make efficient use of indexes. This is a regression from 2.4 => 2.6.

Given a collection with indexes {a:1, b:1} and {a:1, c:1}, consider the following query:

db.collection.find({a:1, $or: [{b:1}, {c:1}]})

The 2.4 query planner considers a union of an scan on index {a:1, b:1} with a scan on index {a:1, c:1}.

The 2.6 query planner does not consider this plan; it selects a plan that scans the whole {a:1} range for one of the two indexes.

There is an available workaround for this issue, which is to rewrite the query as a rooted $or (e.g. {$or: [{a:1, b:1}, {a:1, c:1}]}). For these queries, the 2.6 query planner correctly considers a plan that performs a union of the separate index scans.



 Comments   
Comment by Githook User [ 17/Feb/17 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-13732 Index access plan for contained OR should consider top-level predicates
Branch: master
https://github.com/mongodb/mongo/commit/f77527a942347313e2848e050e89480bc3cadb95

Comment by Githook User [ 01/Sep/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-13732 disable subplanning for contained $or queries
Branch: master
https://github.com/mongodb/mongo/commit/8fde29a7afd0c4fd5cc84ff4f03217f3efe224e1

Comment by David Storch [ 01/Sep/15 ]

Hi all,

After reviewing the fix for this problem included in development release 3.1.6, we have decided that its risks outweigh its benefits. Unfortunately the fix results in a substantial performance regression in the query planner and introduces some fragile logic to the $or query execution machinery. Furthermore, the problem has a relatively straightforward application-level workaround. For these reasons, we are hesitant to include the fix in a stable release. We are disabling the codepath implementing the fix, re-opening this ticket, and refraining from pushing a backport to the 3.0.x stable release series. (Note that although the fix is disabled, we are leaving all the plumbing in place so that enabling in a future release is trivial, should the fix prove to be safe and the performance regression prove to have a minimal impact.)

Our apologies for not fixing this. Please feel free to reach out to me on this ticket with questions or concerns.

Best,
Dave

Comment by Githook User [ 10/Jul/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-13732 rewrite contained $or queries to rooted $or in the SubplanStage

This allows queries with an $or contained within an explicit or implicit $and to be answered with
more efficient plans. It also expands the use of the SubplanStage to include contained $or queries
and therefore may reduce the number of plans considered for these queries.
Branch: master
https://github.com/mongodb/mongo/commit/15c72c8570c63e2e6ba0a3d339a8286d0be604db

Comment by Urs Enke [ 18/Dec/14 ]

You're right – I had mistyped.

Of course, $in is only an alternative when the $or concerns a single field (or when all combinations of the affected fields' values are allowed, so each field could get its own $in without changing the meaning). But whoever is used to MySQL's comparable performance of "b=1 OR b=2" and "b IN (1,2)" could easily assume a suitably simply structured $or to be optimized to use an index, despite the posibility of using the operator in a way that would be harder to optimize.

Comment by Asya Kamsky [ 17/Dec/14 ]

I believe you mean {a:1,$or:[{b:1},{b:2}]} correct?

The issue is when the two queries within $or do not test the same field - then they cannot be rewritten using $in.

Comment by Urs Enke [ 17/Dec/14 ]

A small note: Queries like

{a:1, $or:[{b:1},{b:2}]}

could be rewritten as (or executed like)

{a:1, b:{$in:[1,2]}}

, which may be more efficient than changing anything at the root. One could argue that in such cases it's the user's job to use $in, but (s)he may expect $or to be an efficient choice as well given that it is when used at the root.

Comment by David Storch [ 23/Sep/14 ]

Hi all,

A quick status update on this ticket follows.

We have explored two options for solving this issue on the 2.6 branch:

  • Introduce a logical rewrite that rearranges the abstract syntax tree into a rooted-$or.
  • Alter our exploration of the plan space so that it is capable of enumerating the missed access plans.

After technical review, both of these solutions were deemed to be a step in the wrong direction. The system does not currently have a rewrite engine. Doing this particular logical rewrite breaks some internal assumptions (for instance, the update system's logic for computing the document to insert on {upsert: true} updates depends on the absence of any logical rewrites). Furthermore, this change would improve the plans for one class of queries at the expense of another class.

Regarding the plan space exploration solution: moving forward we hope to alter our exploration strategy in order to be more flexible about which plans we consider and which we don't. Using the current strategy for exploring the plan space, this change is technically difficult if not infeasible.

For the reasons listed above, backport to the 2.6 branch is currently not planned.

Best,
Dave

Comment by J Rassi [ 25/Apr/14 ]

Further detail follows.

In version 2.4 and earlier: for specific classes of queries containing $or clauses, the query planner would consider a plan that unions index scans on separate indexes. Queries that qualified include:

  1. queries rooted at an $or
    • e.g. for the query {$or: [{b:1}, {c:1}]}, the planner would consider a union of a scan on index {b:1} with a scan on index {c:1}
  2. queries rooted at an implicit $and that have an $or child
    • e.g. for the query {a:1, $or: [{b:1}, {c:1}]}, the planner would additionally consider a union of a scan on index {a:1, b:1} with a scan on index {a:1, c:1}

For all other queries containing an $or clause, the 2.4 query planner would not consider assigning indexes to the $or child predicates.

In version 2.6.0: the query planner always considers an indexed plan for $or, but the indexes chosen for the union are only based on the predicates contained in the $or children. That is, when planning the query {a:1, $or: [{b:1}, {c:1}]}, the 2.6 query planner will not consider the {a:1} predicate when creating a plan that takes a union of separate indexes. Employing the workaround (that is, rewriting the query as {$or: [{a:1, b:1}, {a:1, c:1}]}) allows both predicates to be used.

Generated at Thu Feb 08 03:32:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.