Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19742

Equality to null predicates can sometimes be answered using a sparse, compound index

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.0
    • Component/s: Querying
    • Labels:
      None
    • Query Optimization

      Suppose you have the equality to null query predicate {"a.b": null}. This predicate matches documents where the path "a.b" contains a literal null or where the path "a.b" is absent:

      > db.version()
      3.1.7-pre-
      > db.c.drop()
      true
      > db.c.insert({a: {b: null}})
      > db.c.insert({a: {b: 3}})
      > db.c.insert({a: {}})
      > db.c.find({"a.b": null})
      { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
      { "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : {  } }
      

      A sparse index with index key pattern {"a.b": 1} does not have index keys for documents missing the path "a.b". Therefore, if the equality to null query were to use a sparse index, it would miss results:

      > db.c.ensureIndex({"a.b": 1}, {sparse: true})
      
      // If we hint the query to use a sparse index, then we fail to return the
      // document {a: {}}. For this reason, it is incorrect for the query engine to
      // assign equality to null predicates to use sparse indices.
      > db.c.find({"a.b": null}).hint({"a.b": 1})
      { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
      
      // Without the hint, the query engine gets the answer right using a collection
      // scan instead of using the index.
      > db.c.find({"a.b": null})
      { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
      { "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : {  } }
      

      Although the general rule is that equality to null predicates can't use sparse indices, this isn't always true. Consider the following example:

      > db.version()
      3.1.7-pre-
      > db.c.drop()
      true
      > db.c.insert({a: {}, c: {}})
      > db.c.insert({a: {b: 1}, c: {}})
      > db.c.ensureIndex({"a.b": 1, "c.d": 1}, {sparse: true})
      > db.c.find({"a.b": 1, "c.d": null})
      { "_id" : ObjectId("55bff81859a0689eb67b396e"), "a" : { "b" : 1 }, "c" : {  } }
      

      This time we have a compound sparse index {"a.b": 1, "c.d": 1}. Compound sparse indices will only omit index keys for documents where both paths "a.b" and "c.d" are absent; the document {a: {}, c: {}} will not be indexed, but the document {a: {b: 1}, c: {}} will, because path "a.b" exists.

      Due to these sparse index semantics, the check for "a.b" equal to 1 in the query ensures that all matching documents have index keys in the sparse index. As a result, it is actually safe to answer "c.d": null using the sparse index. However, the current query engine does not constrain the bounds on "c.d":

      > db.c.find({"a.b": 1, "c.d": null}).explain("queryPlanner")
      ...
      					"stage" : "IXSCAN",
      					"keyPattern" : {
      						"a.b" : 1,
      						"c.d" : 1
      					},
      					"indexName" : "a.b_1_c.d_1",
      					"isMultiKey" : false,
      					"isUnique" : false,
      					"isSparse" : true,
      					"isPartial" : false,
      					"indexVersion" : 1,
      					"direction" : "forward",
      					"indexBounds" : {
      						"a.b" : [
      							"[1.0, 1.0]"
      						],
                                                      // MISSING OPTIMIZATION! The bounds are not constrained.
      						"c.d" : [
      							"[MinKey, MaxKey]"
      						]
      ...
      }
      

      As an optimization, we could use bounds [[null, null]] for the trailing field "c.d".

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: