[SERVER-12869] Index null values and missing values differently Created: 24/Feb/14  Updated: 21/Sep/23

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: 2.6.0-rc0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Samantha Ritter (Inactive) Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 31
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-13298 distinct doesn't use index to distinc... Closed
is depended on by SERVER-18861 Queries matching null value should be... Backlog
Duplicate
is duplicated by SERVER-52845 $exists works slowly Closed
is duplicated by SERVER-67378 Index null values and missing values ... Closed
is duplicated by SERVER-17376 Allow $exists to efficiently use indexes Closed
Related
related to SERVER-23229 Projection incorrectly returns null v... Backlog
related to SERVER-67378 Index null values and missing values ... Closed
related to SERVER-54811 Bad query plans for $exists:true queries Backlog
is related to SERVER-6293 Index only query fills in missing val... Closed
is related to SERVER-12557 $exists could use index Closed
is related to SERVER-55065 Null queries should be covered by ind... Closed
is related to SERVER-14832 Distinct command result set may inclu... Needs Scheduling
is related to SERVER-18861 Queries matching null value should be... Backlog
is related to SERVER-12262 Use full index scan as proxy for coll... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

A document with a missing value are stored as if they were
explicitly given the value 'null'. Given:

X = { b : 1 }                                                                                                                                                                                  
Y = { a : null, b : 1 }

X and Y look identical from within a standard index on

{ a : 1 }

However a sparse index on

{ a : 1 }

will treat X and Y differently,
storing Y and not storing X.

We can safely use an index on $exists queries in the following cases:
{a:{ $exists:true }} - normal index helps, but we must still fetch
{a:{ $exists:true }} - sparse index is exact
{a:{ $exists:false }} - normal index requires a fetch
{a:{ $exists:false }} - never



 Comments   
Comment by Chris Harris [ 14/Sep/23 ]

Hi ewan.higgs@deliverect.com, ksa, Watchers, and Voters - thanks for the interest in this functionality.

With respect to guidance in this area, a Covered Query section presently exists in our documentation. That page outlines some limitations for when the covering technique can be utilized, including the case when there is a reference to null in the query. We have been discussing with the team and have opened a request to improve the discoverability and applicability of this information from some of the other pages.

With respect to the general behavior discussed in this ticket, we have made progress over the years in addressing as much of the gap as possible at query execution layer. SERVER-55065 represents one specific example of that. As that other ticket suggests, addressing the underlying cause of the issue requires a change to the storage format used by indexes. Such a change necessitates that we introduce a new version for indexes (presumably "v3"). We are presently reviewing what enhancements we would like to include in that next iteration, with this issue being a one of the most prominent.

There are a lot of moving parts when it comes to delivering enhancements associated with the foundational index structure. This will take some time to accomplish safely, but we understand the important performance limitations represented by this ticket. Stay tuned for further updates in the future. 

Comment by Sergei Kuzmin [ 30/Aug/23 ]

Anyone?

Comment by Sergei Kuzmin [ 09/Jun/22 ]

chris.kelly@mongodb.com, benety.goh@mongodb.com, milena.ivanova@mongodb.com Can you please help triage?

Comment by Sergei Kuzmin [ 08/Jun/22 ]

MongoDB version: 5.0.8-7
I've spent a day trying to understand why the index doesn't work at all. After figuring out, was about to file an issue, and then found this one. I think this is a major issue and surprisingly it wasn't addressed yet. AFAIS the statement in the issue description is (at least now) incorrect. Actually:

{a:{ $exists: true }} - the index here DOESN'T help and requires FETCH
{a:{ $exists: false }} - index helps and requires FETCH (was correct).                                The FETCH is non-trivial, i.e. it may actually filter out documents

Let's have {a: 1}, {a: 1, b: null}, {a: 1, b: 2} and index on field b. Because {b: null} should be included in {$exists: true} results, we would need to include `null` type into our scan range. So both existing and non-existing values would be included which means full index scan + fetch/filtering by looking into field presence in the document. So full table scan with filtering is more efficient than using the index.

I think the right way would be to add (actually just reserve for indexing purposes) `absent` (`notexists`) BSON type, e.g. absent=-2 or absent=0. This type should be disallowed in documents (documents in BSON format should never contain it) but used in non-sparse indexes. An additional benefit would be `not exists` partial index. 

In my use case, I was trying to implement a work queue. Jobs could spawn sub-jobs, which would have non-empty `parentJobId`. After initial batch scheduling, I have many "parent" jobs with absent `parentJobId`. The idea was to process child jobs first and then parent jobs. But unexpectedly the below caused full scan despite having index on parentJobId:

# Index rejected as it would require [MinKey, MaxKey] IXSCAN
db.jobs.find({parentJobId: {$exists: true}}).limit(1)
# This one is better. Still does [MinKey, undefined) U (null, MaxKey] scan
db.jobs.find({parentJobId: {$ne: null}}).limit(1)
# Proper current workaround. Scans [ObjectId('000000000000000000000000'),
# ObjectId('ffffffffffffffffffffffff') which does single seek + single document fetch
db.jobs.find({parentJobId: {$type: "objectid"}}).limit(1)

For the `$exists: false}` case:

#  Performs [null, null] IXSCAN and {"parentJobId": {"$not" : {$exists: true}}} FETCH
db.jobs.find({parentJobId: {$exists: false}}).limit(1)
# This one is surprising. It does IXSCAN for [undefined, undefined]U[null, null].
# Why to search for (deprecated?) `undefined` type if I explicitely search for just null
db.jobs.find({parentJobId: null}).limit(1)
# The workaround. Does [null, null] IXSCAN. Returns only docs with null values,
# not ones with missing field
db.jobs.find({parentJobId: {$type: 'null'}}).limit(1)

As I had no {"parentJobId": null} documents the above worked well. It wouldn't be well if I had much more such documents than documents with absent parentJobId.

Comment by Ewan Higgs [ 11/Feb/22 ]

This was opened 8 years ago. Will it ever be prioritised to be fixed? Otherwise, perhaps turn it into a documentation ticket since this is how things work.

 

https://docs.mongodb.com/manual/core/index-sparse/ contains the following:

 

Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value. The index skips over any document that is missing the indexed field. The index is "sparse" because it does not include all documents of a collection. By contrast, non-sparse indexes contain all documents in a collection, storing null values for those documents that do not contain the indexed field.

We can divine the fact that a normal index will have null indistinguishable from {"$exists": false} but it's the kind of documentation that only works backwards. i.e. you will only divine this if you know the underlying facts and you can determine the statement is true. If you are using docs to learn about the system you won't make this connection (because you won't be looking at the sparse index document you will be looking at the normal index document).

 

The string "null" does not appear in the following docs:

https://docs.mongodb.com/manual/core/index-single/

https://docs.mongodb.com/manual/core/index-compound/

https://docs.mongodb.com/manual/applications/indexes/

https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/

Generated at Thu Feb 08 03:29:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.