Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3104

index bound improvements for elemMatch query on multikey index

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.3.0
    • Affects Version/s: None
    • Component/s: Querying
    • Labels:
      None

      Mongo does not compute a cartesian product when creating a compound index on multiple fields. If the document { a:[

      { b:1, c:2 }

      ,

      { b:10, c:20 }

      ] } is indexed according to index

      { 'a.b':1, 'a.c':1 }

      , the index keys created are

      { '':1, '':2 }

      and

      { '':10, '':20 }

      . (There is no index key

      { '':1, '':20 }

      for example.)

      A) Now, suppose we have a query

      { 'a.b':1, 'a.c':20 }

      . This query is supposed to match the document, because an 'a.b' value of 1 exists in the document, and an 'a.c' value of 20 exists in the document. However, there is no index key containing both 1 in the 'a.b' position and 20 in the 'a.c' position. As a result, the index bounds on 'a.b' will be [[ 1, 1 ]] but there will not be any index bounds on 'a.c'. This means the index key

      { '':1, '':2 }

      will be retrieved and used to find the full document, and the Matcher will determine that the full document matches the query. Here's a demo:

      c = db.c;
      c.drop();
      c.save( { a:[ { b:1, c:2 }, { b:10, c:20 } ] } );
      c.ensureIndex( { 'a.b':1, 'a.c':1 } );
      printjson( c.find( { 'a.b':1, 'a.c':20 } ).explain() );
      

      B) However, if $elemMatch is used in the query then an element of the 'a' array must match all the $elemMatch criteria. In other words, if the query becomes { a:{ $elemMatch:

      { b:1, c:20 }

      } } then the original document will not match because no element of a matches

      { b:1, c:20 }

      . In this case, index keys lacking a match on one field (like

      { '':1, '':2 }

      ) need not be examined, and precise index bounds can be used for the 'a.c' field. Here's a demo:

      c = db.c;
      c.drop();
      c.save( { a:[ { b:1, c:2 }, { b:10, c:20 } ] } );
      c.ensureIndex( { 'a.b':1, 'a.c':1 } );
      printjson( c.find( { a:{ $elemMatch:{ b:1, c:20 } } } ).explain() );
      

      This ticket implements the index bounds behavior seen in B. More precisely, our old behavior was that if two indexed field paths shared a common prefix, then only the first of those field paths appearing in the index would have its index bounds used for the query. With this ticket, if the index bounds for the two field paths come from the same $elemMatch clause then the index bounds on both field paths are used for the query.

      Additionally, this optimization is only applied if the field names within the $elemMatch are undotted. There are some cases where the optimization does not work correctly if the fields are dotted, described in SERVER-7509. Also, this ticket does not affect the behavior for determining which index bounds among a set of candidate index bounds are chosen for a given field. For example, in the query { 'a.b':

      { $gt:10 }

      , a:{ $elemMatch:

      { b:2, c:3 }

      } may use the field path [[10, max number]] on 'a.b' rather than [[2, 2]].

      ---------------------

      // Given a multikey index { 'a.b':1, 'a.c':1 } and query { 'a.b':3, 'a.c':3 } only the index field
      // 'a.b' is constrained to the range [3, 3], while the index field 'a.c' is just constrained
      // to be within minkey and maxkey.  This implementation ensures that the document
      // { a:[ { b:3 }, { c:3 } ] }, which generates index keys { 'a.b':3, 'a.c':null } and
      // { 'a.b':null and 'a.c':3 } will be retrieved for the query.  (See SERVER-958 for more
      // information.)
      //
      // If the query is instead { a:{ $elemMatch:{ b:3, c:3 } } } then the document
      // { a:[ { b:3 }, { c:3 } ] } does not match.  Until SERVER-3104 was implemented, the index
      // constraints would be [3,3] on the 'a.b' field and [minkey,maxkey] on the 'a.c' field, the same as
      // for the non $elemMatch query in the previous paragraph.  With the SERVER-3104 implementation,
      // constraints on two fields within a $elemMatch parent can both be applied to an index.  Due to the
      // SERVER-3104 implementation, the index constraints become [3,3] on the 'a.b' field _and_ [3,3] on
      // the 'a.c' field.
      

            Assignee:
            aaron Aaron Staple
            Reporter:
            aaron Aaron Staple
            Votes:
            22 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated:
              Resolved: