Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19114

Rooted $or queries with a sort may select a SORT plan over a potentially faster SORT_MERGE plan

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • Query Optimization
    • ALL
    • Hide
      db.coll.drop();
      db.coll.ensureIndex({a: 1});
      db.coll.ensureIndex({b: 1});
      for (var i = 0; i < 10; i++) { db.coll.insert({b: 1}); }
      
      // A SORT plan will be selected. The SORT_MERGE plan is not considered.
      db.coll.explain("queryPlanner").find({$or: [{a: 1, b: 1}, {b: 1}]}).sort({b: 1});
      
      Show
      db.coll.drop(); db.coll.ensureIndex({a: 1}); db.coll.ensureIndex({b: 1}); for ( var i = 0; i < 10; i++) { db.coll.insert({b: 1}); } // A SORT plan will be selected. The SORT_MERGE plan is not considered. db.coll.explain( "queryPlanner" ).find({$or: [{a: 1, b: 1}, {b: 1}]}).sort({b: 1});

      Issue Status as of March 9, 2020

      ISSUE DESCRIPTION AND IMPACT

      Queries consisting of a single $or element at the top level of the query JSON are called "rooted $or" queries. For example:

      • {$or: [{a: 1}, {b: 1}]} is a rooted $or query
      • {c:1, $or: [{a: 1}, {b: 1}]} is not

      When a rooted $or query includes a sort specification, the MongoDB query planner can choose a sub-optimal query plan. This is because the planner evaluates each $or clause independently and doesn't evaluate each clause with the knowledge that they are being sorted in the same way.

      As a result, one or more $or clauses may use indexes that do not provide results in the order required by the sort, necessitating a potentially slow blocking SORT stage in the execution plan.

      DIAGNOSIS AND AFFECTED VERSIONS

      This issue can be confirmed using explain(). When a blocking sort is being used, a SORT stage will be present (instead of SORT_MERGE) in the .explain("queryPlanner") output.

      This issue affects all currently supported versions of MongoDB.

      REMEDIATION AND WORKAROUNDS

      For cases where all $or clauses can use the same index, one possible workaround is to use hint to that query to force a merge sort. The presence of a SORT_MERGE stage in the explain output indicates that a merge sort is being used. The plan chosen for each clause of the $or is producing the requested sort order, and a total order can be obtained by merging the results from each sorted stream. 

      Another possible workaround to force a merge sort is to constrain index consideration using an index filter. Note that index filters do not persist across server restarts and must be configured individually on each node in the cluster.

      Additionally, It is necessary to ensure:

      • Indexes for each $or clause exist and provide results in the desired sort order.
      • There are no indexes for each $or clause that would require a blocking sort to satisfy the desired sort order.

      Rooted $or queries are planned by a special "subplanner" mechanism in which the best plan for each $or clause is selected individually. For rooted $or queries with a sort, the subplanner's "local" optimization of each clause may result in a suboptimal global query plan.

      In particular, the subplanner can cause us to choose a plan with a blocking SORT stage, entirely missing the fact that a non-blocking SORT_MERGE plan is available. Suppose that there is a rooted $or query with two clauses and a sort:

      db.collection.find({$or: [clause1, clause2]}).sort(sortSpec);
      

      Suppose that there are two plans for clause1:

      • plan1, which provides the sort sortSpec, and
      • plan2, which does not provide the sort sortSpec.

      If plan2 is a better plan for clause1, then the subplanner will always select it. However, if the winning plan for clause2 provides the sort sortSpec, then it may actually be better to select plan1 for clause1. Selecting the plan that provides the sort means that both subplans provide the sort, and therefore they can be merged using the merge phase of the mergeSort algorithm, rather than performing a full in-memory sort. The subplanner does not currently evaluate this tradeoff, but rather always chooses plan2.

      See repro steps for a simple script to reproduce.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            2 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated: