-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.0
-
Component/s: Querying
-
None
-
Query Optimization
-
(copied to CRM)
Suppose you have the equality to null query predicate {"a.b": null}. This predicate matches documents where the path "a.b" contains a literal null or where the path "a.b" is absent:
> db.version() 3.1.7-pre- > db.c.drop() true > db.c.insert({a: {b: null}}) > db.c.insert({a: {b: 3}}) > db.c.insert({a: {}}) > db.c.find({"a.b": null}) { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } } { "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : { } }
A sparse index with index key pattern {"a.b": 1} does not have index keys for documents missing the path "a.b". Therefore, if the equality to null query were to use a sparse index, it would miss results:
> db.c.ensureIndex({"a.b": 1}, {sparse: true}) // If we hint the query to use a sparse index, then we fail to return the // document {a: {}}. For this reason, it is incorrect for the query engine to // assign equality to null predicates to use sparse indices. > db.c.find({"a.b": null}).hint({"a.b": 1}) { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } } // Without the hint, the query engine gets the answer right using a collection // scan instead of using the index. > db.c.find({"a.b": null}) { "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } } { "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : { } }
Although the general rule is that equality to null predicates can't use sparse indices, this isn't always true. Consider the following example:
> db.version() 3.1.7-pre- > db.c.drop() true > db.c.insert({a: {}, c: {}}) > db.c.insert({a: {b: 1}, c: {}}) > db.c.ensureIndex({"a.b": 1, "c.d": 1}, {sparse: true}) > db.c.find({"a.b": 1, "c.d": null}) { "_id" : ObjectId("55bff81859a0689eb67b396e"), "a" : { "b" : 1 }, "c" : { } }
This time we have a compound sparse index {"a.b": 1, "c.d": 1}. Compound sparse indices will only omit index keys for documents where both paths "a.b" and "c.d" are absent; the document {a: {}, c: {}} will not be indexed, but the document {a: {b: 1}, c: {}} will, because path "a.b" exists.
Due to these sparse index semantics, the check for "a.b" equal to 1 in the query ensures that all matching documents have index keys in the sparse index. As a result, it is actually safe to answer "c.d": null using the sparse index. However, the current query engine does not constrain the bounds on "c.d":
> db.c.find({"a.b": 1, "c.d": null}).explain("queryPlanner") ... "stage" : "IXSCAN", "keyPattern" : { "a.b" : 1, "c.d" : 1 }, "indexName" : "a.b_1_c.d_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : true, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "a.b" : [ "[1.0, 1.0]" ], // MISSING OPTIMIZATION! The bounds are not constrained. "c.d" : [ "[MinKey, MaxKey]" ] ... }
As an optimization, we could use bounds [[null, null]] for the trailing field "c.d".
- is related to
-
SERVER-36465 Non-multikey sparse indexes can be used to answer query with {$ne:null} predicate
- Closed