[SERVER-12869] Index null values and missing values differently Created: 24/Feb/14 Updated: 21/Sep/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 2.6.0-rc0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Samantha Ritter (Inactive) | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 31 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
A document with a missing value are stored as if they were
X and Y look identical from within a standard index on { a : 1 }
However a sparse index on { a : 1 } will treat X and Y differently, We can safely use an index on $exists queries in the following cases: |
| Comments |
| Comment by Chris Harris [ 14/Sep/23 ] | |||||||||||||||||
|
Hi ewan.higgs@deliverect.com, ksa, Watchers, and Voters - thanks for the interest in this functionality. With respect to guidance in this area, a Covered Query section presently exists in our documentation. That page outlines some limitations for when the covering technique can be utilized, including the case when there is a reference to null in the query. We have been discussing with the team and have opened a request to improve the discoverability and applicability of this information from some of the other pages. With respect to the general behavior discussed in this ticket, we have made progress over the years in addressing as much of the gap as possible at query execution layer. There are a lot of moving parts when it comes to delivering enhancements associated with the foundational index structure. This will take some time to accomplish safely, but we understand the important performance limitations represented by this ticket. Stay tuned for further updates in the future. | |||||||||||||||||
| Comment by Sergei Kuzmin [ 30/Aug/23 ] | |||||||||||||||||
|
Anyone? | |||||||||||||||||
| Comment by Sergei Kuzmin [ 09/Jun/22 ] | |||||||||||||||||
|
chris.kelly@mongodb.com | |||||||||||||||||
| Comment by Sergei Kuzmin [ 08/Jun/22 ] | |||||||||||||||||
|
MongoDB version: 5.0.8-7
Let's have {a: 1}, {a: 1, b: null}, {a: 1, b: 2} and index on field b. Because {b: null} should be included in {$exists: true} results, we would need to include `null` type into our scan range. So both existing and non-existing values would be included which means full index scan + fetch/filtering by looking into field presence in the document. So full table scan with filtering is more efficient than using the index. I think the right way would be to add (actually just reserve for indexing purposes) `absent` (`notexists`) BSON type, e.g. absent=-2 or absent=0. This type should be disallowed in documents (documents in BSON format should never contain it) but used in non-sparse indexes. An additional benefit would be `not exists` partial index. In my use case, I was trying to implement a work queue. Jobs could spawn sub-jobs, which would have non-empty `parentJobId`. After initial batch scheduling, I have many "parent" jobs with absent `parentJobId`. The idea was to process child jobs first and then parent jobs. But unexpectedly the below caused full scan despite having index on parentJobId:
For the `$exists: false}` case:
As I had no {"parentJobId": null} documents the above worked well. It wouldn't be well if I had much more such documents than documents with absent parentJobId. | |||||||||||||||||
| Comment by Ewan Higgs [ 11/Feb/22 ] | |||||||||||||||||
|
This was opened 8 years ago. Will it ever be prioritised to be fixed? Otherwise, perhaps turn it into a documentation ticket since this is how things work.
https://docs.mongodb.com/manual/core/index-sparse/ contains the following:
We can divine the fact that a normal index will have null indistinguishable from {"$exists": false} but it's the kind of documentation that only works backwards. i.e. you will only divine this if you know the underlying facts and you can determine the statement is true. If you are using docs to learn about the system you won't make this connection (because you won't be looking at the sparse index document you will be looking at the normal index document).
The string "null" does not appear in the following docs: https://docs.mongodb.com/manual/core/index-single/ https://docs.mongodb.com/manual/core/index-compound/ https://docs.mongodb.com/manual/applications/indexes/ https://docs.mongodb.com/manual/tutorial/create-indexes-to-support-queries/ |