[SERVER-19114] Rooted $or queries with a sort may select a SORT plan over a potentially faster SORT_MERGE plan Created: 24/Jun/15 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Issue Status as of March 9, 2020 ISSUE DESCRIPTION AND IMPACT Queries consisting of a single $or element at the top level of the query JSON are called "rooted $or" queries. For example:
When a rooted $or query includes a sort specification, the MongoDB query planner can choose a sub-optimal query plan. This is because the planner evaluates each $or clause independently and doesn't evaluate each clause with the knowledge that they are being sorted in the same way. As a result, one or more $or clauses may use indexes that do not provide results in the order required by the sort, necessitating a potentially slow blocking SORT stage in the execution plan. DIAGNOSIS AND AFFECTED VERSIONS This issue can be confirmed using explain(). When a blocking sort is being used, a SORT stage will be present (instead of SORT_MERGE) in the .explain("queryPlanner") output. This issue affects all currently supported versions of MongoDB. REMEDIATION AND WORKAROUNDS For cases where all $or clauses can use the same index, one possible workaround is to use hint to that query to force a merge sort. The presence of a SORT_MERGE stage in the explain output indicates that a merge sort is being used. The plan chosen for each clause of the $or is producing the requested sort order, and a total order can be obtained by merging the results from each sorted stream. Another possible workaround to force a merge sort is to constrain index consideration using an index filter. Note that index filters do not persist across server restarts and must be configured individually on each node in the cluster. Additionally, It is necessary to ensure:
Rooted $or queries are planned by a special "subplanner" mechanism in which the best plan for each $or clause is selected individually. For rooted $or queries with a sort, the subplanner's "local" optimization of each clause may result in a suboptimal global query plan. In particular, the subplanner can cause us to choose a plan with a blocking SORT stage, entirely missing the fact that a non-blocking SORT_MERGE plan is available. Suppose that there is a rooted $or query with two clauses and a sort:
Suppose that there are two plans for clause1:
If plan2 is a better plan for clause1, then the subplanner will always select it. However, if the winning plan for clause2 provides the sort sortSpec, then it may actually be better to select plan1 for clause1. Selecting the plan that provides the sort means that both subplans provide the sort, and therefore they can be merged using the merge phase of the mergeSort algorithm, rather than performing a full in-memory sort. The subplanner does not currently evaluate this tradeoff, but rather always chooses plan2. See repro steps for a simple script to reproduce. |
| Comments |
| Comment by Ana Meza [ 19/Nov/21 ] |
|
Reopening for reconsideration, maybe as part of CQF This one was part of PM-416 but the EPIC was closed as Won't do |
| Comment by Sam Ofavezi [ 04/Dec/15 ] |
|
We are hitting this problem and we really need this to get resolved asap. I'm sure other Mongo users have the same problem as this is a problem with plan engine and this is core to mongo and any user is relying on the plan engine to work properly. |