[SERVER-19742] Equality to null predicates can sometimes be answered using a sparse, compound index Created: 03/Aug/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-36465 Non-multikey sparse indexes can be us... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Suppose you have the equality to null query predicate {"a.b": null}. This predicate matches documents where the path "a.b" contains a literal null or where the path "a.b" is absent:

> db.version()
3.1.7-pre-
> db.c.drop()
true
> db.c.insert({a: {b: null}})
> db.c.insert({a: {b: 3}})
> db.c.insert({a: {}})
> db.c.find({"a.b": null})
{ "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
{ "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : {  } }

A sparse index with index key pattern {"a.b": 1} does not have index keys for documents missing the path "a.b". Therefore, if the equality to null query were to use a sparse index, it would miss results:

> db.c.ensureIndex({"a.b": 1}, {sparse: true})
 
// If we hint the query to use a sparse index, then we fail to return the
// document {a: {}}. For this reason, it is incorrect for the query engine to
// assign equality to null predicates to use sparse indices.
> db.c.find({"a.b": null}).hint({"a.b": 1})
{ "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
 
// Without the hint, the query engine gets the answer right using a collection
// scan instead of using the index.
> db.c.find({"a.b": null})
{ "_id" : ObjectId("55bff6c559a0689eb67b396a"), "a" : { "b" : null } }
{ "_id" : ObjectId("55bff6ce59a0689eb67b396c"), "a" : {  } }

Although the general rule is that equality to null predicates can't use sparse indices, this isn't always true. Consider the following example:

> db.version()
3.1.7-pre-
> db.c.drop()
true
> db.c.insert({a: {}, c: {}})
> db.c.insert({a: {b: 1}, c: {}})
> db.c.ensureIndex({"a.b": 1, "c.d": 1}, {sparse: true})
> db.c.find({"a.b": 1, "c.d": null})
{ "_id" : ObjectId("55bff81859a0689eb67b396e"), "a" : { "b" : 1 }, "c" : {  } }

This time we have a compound sparse index {"a.b": 1, "c.d": 1}. Compound sparse indices will only omit index keys for documents where both paths "a.b" and "c.d" are absent; the document {a: {}, c: {}} will not be indexed, but the document {a: {b: 1}, c: {}} will, because path "a.b" exists.

Due to these sparse index semantics, the check for "a.b" equal to 1 in the query ensures that all matching documents have index keys in the sparse index. As a result, it is actually safe to answer "c.d": null using the sparse index. However, the current query engine does not constrain the bounds on "c.d":

> db.c.find({"a.b": 1, "c.d": null}).explain("queryPlanner")
...
					"stage" : "IXSCAN",
					"keyPattern" : {
						"a.b" : 1,
						"c.d" : 1
					},
					"indexName" : "a.b_1_c.d_1",
					"isMultiKey" : false,
					"isUnique" : false,
					"isSparse" : true,
					"isPartial" : false,
					"indexVersion" : 1,
					"direction" : "forward",
					"indexBounds" : {
						"a.b" : [
							"[1.0, 1.0]"
						],
                                                // MISSING OPTIMIZATION! The bounds are not constrained.
						"c.d" : [
							"[MinKey, MaxKey]"
						]
...
}

As an optimization, we could use bounds [[null, null]] for the trailing field "c.d".


Generated at Thu Feb 08 03:51:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.