Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92912

Target shards more efficiently for top-level OR query plans

    • Query Optimization

      The first step in shard targeting is to plan the query (QueryPlanner::plan) with the shard key as a mock index to figure out what ranges in the shard key are needed to satisfy the query. Once we have a QuerySolution, we extract the IndexBounds (map from shard key field to intervals) and then figure out what chunks intersect these intervals.

      Problem 1: When the QuerySolution is a top-level OR, we can run into issues. Consider the following setup:

      // shard key 
      {a: 1, b: 1} 
      
      // split points
      [{a: 1, b: 1} /* on shard0*/, {a: 1, b: 2} /*shard1*/, {a: 2, b: 1} /*shard2*/, {a: 2, b: 2} /*shard3*/]
      
      // query
      {$or: [{a: 1, b: 1}, {a: 2, b: 2}]} 

      The resulting QuerySolution is an OR of two IXSCANS, where each IXSCAN is a point scan on the shard key. When we extract the IndexBounds for this solution, we end up with field "a" mapped to the point intervals 1 and 2, and field "b" mapped to the point intervals 1 and 2. There's room for improvement here, since we've lost some information about which values of "a" and which values of "b" should be associated.

      Problem 2: The next step is to "flatten" these IndexBounds, essentially turning them back into ranges on the shard key. In this example, the flattened index bounds represent the following ranges:

      { a: 1, b: 1 } -> { a: 1, b: 2 }
      { a: 2, b: 1 } -> { a: 2, b: 2 }

      This was a bit surprising to me. I might expect:

      { a: 1, b: 1 } -> { a: 1, b: 1 }
      { a: 1, b: 2 } -> { a: 1, b: 2 }
      { a: 2, b: 1 } -> { a: 2, b: 1 }
      { a: 2, b: 2 } -> { a: 2, b: 2 }

      Note, this would potentially target too many shards due to problem 1, but it could be better than the ranges we do generate. We seem to do this on purpose, probably to avoid generating too many ranges (see this comment). Still, there may be some room for improvement without going overboard: could we strike a different balance between generating many ranges vs targeting potentially many shards?

        1. simple_repro.js
          1 kB
          Hana Pearlman

            Assignee:
            Unassigned Unassigned
            Reporter:
            hana.pearlman@mongodb.com Hana Pearlman
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: