[SERVER-18861] Queries matching null value should be fully covered by index Created: 08/Jun/15 Updated: 05/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andrey Hohutkin | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 19 |
| Labels: | index-version, qopt-team, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query Optimization 2021-02-22, Query Optimization 2021-03-08 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Because of issue closing Assume, we have a collectoin with documents like
Also we have an non sparse index on "a" field. If we query all documents where a doesnt exist or null with {a: null}, an engine always fetches documents even having field indexed. Need to improve query planner to allow query null valued field and to build an index which can detect collisions with array values. |
| Comments |
| Comment by Alya Berciu [ 27/Oct/21 ] | |||||||||||
|
fuat.sungur if the index is not multikey, we use a count scan (no fetch stage) for a count on an indexed field with {$eq: null} in 4.9.0 and above. So, for example, for a non-multikey index on field a, we can avoid a fetch stage for a query like the following: db.c.count({a: {$eq: null}}) In this instance, instead of fetching the documents, we use two count scans joined by an OR stage to match both null and undefined. | |||||||||||
| Comment by Fuat Sungur [ 24/Oct/21 ] | |||||||||||
|
alya.berciu, we have the similar problem with a customer that, $eq:null will cause unnecessary FETCH, however they are on version 4.4. Can we confirm that with the implementation of | |||||||||||
| Comment by Denys Isaiev [ 04/Oct/21 ] | |||||||||||
|
Thanks, mongo upgrade indeed resolved our issue | |||||||||||
| Comment by Alya Berciu [ 16/Sep/21 ] | |||||||||||
|
isaevdan@gmail.com the first option (for non-multikey indexes) is implemented in versions 4.9.0 and above (see | |||||||||||
| Comment by Denys Isaev [ 16/Sep/21 ] | |||||||||||
|
Thanks for replies So as I understand there are two options:
My question would be then if it is possible to implement any of the optimizations above? In general what is the recommendation to workaround such cases of filtering with null value? | |||||||||||
| Comment by Alya Berciu [ 15/Sep/21 ] | |||||||||||
|
isaevdan@gmail.com , charlie.swanson is correct: in your first example, we don’t use a COUNT_SCAN stage, nor do we just use an IXSCAN stage without a FETCH, because the index is multikey and therefore cannot differentiate between the [], null, and missing values for the indexed field ‘Value’. For your second example, the execution plan would be using COUNT_SCANs with no FETCH stage if your index is not multikey. Your index may be multikey if at some point a document in your collection had a 'Value' field that was an array. To elaborate on Charlie's explanation, because the index itself stores null, missing, and [] in the same way, if we counted what the index scan returns for this predicate, we would also count empty arrays, which is incorrect. That’s why we need to fetch the document first, so we can see the actual value of ‘Value’, which based on the index would include any documents whose ‘Value’ field is either null, missing, or [], and then filter out documents where the ‘Value’ field is []. Therefore we wouldn’t be able to eliminate the filter stage at all for this specific predicate on a multikey index, though we should be able to do it for a predicate which includes empty arrays (I don't think we currently do this, however). | |||||||||||
| Comment by Denys Isaev [ 15/Sep/21 ] | |||||||||||
|
I didn't know that array value marks index as 'multikey', and added that values just to illustrate that points(2,4) from this comment do not apply https://jira.mongodb.org/browse/SERVER-18653?focusedCommentId=931817&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-931817 (I don't know why links is no visible in initial comment). In fact I do have only 2 cases { Value: null } { Value: \{ _id: someid and other date}}
I don't even have cases without field Value in place. And when I try to count all records with { Value : null } I end up with fetch stage and slow query So can this case be solved with engine optimizations? | |||||||||||
| Comment by Charlie Swanson [ 15/Sep/21 ] | |||||||||||
|
isaevdan@gmail.com I'd be interested in alya.berciu's opinion as well, but it seems to me that we could not eliminate the fetch stage in your example due to the Value: [] document which makes the index "multikey". This "multikey" tag simply means there was at least one array value inserted, so there are potentially multiple index keys per document. Once the index is multikey, it is not eligible for a COUNT_SCAN because there could be multiple index entries for a single document which would incorrectly inflate the count. In your example that is clearly not the case, but the system metadata does not currently distinguish between you inserting an empty array like you do here vs a full array like Value: [3, 4, 5] or Value: [null, null]. We would not want to count each of 3, 4, 5 separately nor would we want to count both of the nulls separately. See also We might be able to get rid of the filter on Value: {$eq: null} in this case, but I'm not optimistic that it would have a dramatic impact on query performance. | |||||||||||
| Comment by Denys Isaev [ 15/Sep/21 ] | |||||||||||
|
Hi, I have a very specific case and went though all of the discussion around this topic and I think that this case might be optimized in engine without any new features. Please correct me if I'm wrong. My case is that I have a following collection: db.test.insert({Value: []}) and index db.test.createIndex({Value: 1}) I execute query db.test.count({Value: null}) From query plan I see that there is fetch stage with filter Value: {$eq : null} As I understand there is no reason for this, since I don't need to know which case exactly is there(missing field or null value) for count operation. Also since this is a root level field points 2 and 4 from this comment do not apply here. If I'm right is that possible to implement optimization for count operation on root level fields? | |||||||||||
| Comment by Alya Berciu [ 09/Mar/21 ] | |||||||||||
|
Until the new index format is available, this issue cannot be fully fixed. In the meantime, we are using another ticket to track some tactical improvements: https://jira.mongodb.org/browse/SERVER-55065. | |||||||||||
| Comment by Charlie Swanson [ 23/Feb/21 ] | |||||||||||
|
alya.berciu as we discussed during standup, this optimization may still make sense if the index is not "multikey" and has never seen any array values inserted. It becomes more fragile of an optimization, but I think still worthwhile. Assigning this back to you to take another look to make sure that works. | |||||||||||
| Comment by Ronaldo Moura [ 11/Aug/20 ] | |||||||||||
|
Any plans to solve this issue ? Any recommend workaround on this issue ? We have skip and count() commands using a: null, and they are loading all documents, ignoring the existent index Tks. | |||||||||||
| Comment by Daniel Playfair Cal [ 09/Apr/20 ] | |||||||||||
|
OK, perhaps only the possibility of a missing field is enough to cause the problem.
Let me take a step back and try to explain better what I want. I have a collection with documents matching the following schema:
I want to query for documents where a is null. Note that I already know that `a` is either `null` or an ObjectId, even if Mongo doesn't know.
To do this I built the following index:
If I do the simple query:
Then there is an unecessary FETCH stage with a filter for `a: null`, even though the field `a` is not in the projection. Also, in circumstances where there are more fields and other indexes, mongo will sometimes choose to use an index which is much less ideal, which causes a much more serious performance problem than a non covered query. So my question is, can I somehow change my query so that it matches the semantics of the index and it can be covered? I have chosen a projection that does not include the `a` field, so there should be no need for mongo to fetch anything from disk to return that field because I have not asked it to return that field at all. For example, given that missing values are included in null index keys, I could amend my query to also accept missing values:
But this still results in a FETCH stage. Based on this comment, mongo also creates null index keys for the empty array. So I attempted to amend my query again:
But again, there is still an unecessary FETCH stage with a filter for `a` similar to my query, and mongo still does not always prefer to use the index on `a` which is I know is best suited to the query.
Is is possible to create a query that allows mongo to cover it an index by explicitly filtering for all the values that might generate a null key in the index? If so, how can I find what these possible values are? | |||||||||||
| Comment by Asya Kamsky [ 03/Apr/20 ] | |||||||||||
|
To summarize: When collection has an index on a and two documents:
The query {{ {a:null}}} with projection {{ {a:1, _id:0}}} needs to match both the documents (which it can do from the index) but correct values for field a can not be determined from the index and the documents must be fetched. | |||||||||||
| Comment by Asya Kamsky [ 03/Apr/20 ] | |||||||||||
|
> mismatch between the semantics of the query There is not a mismatch here - the issue is that both match null and missing field. However to return that field value, the document has to be examined. In other words, if you query
and there is an index on a we don't know whether to return
or
only from the index, without examining the actual document. Index can fully answer the query {a:null} but it cannot cover the projection for the field from the index alone. | |||||||||||
| Comment by Daniel Playfair Cal [ 03/Apr/20 ] | |||||||||||
|
I have just done some experiments on the latest version (4.2.5) and so far I've failed to find a workaround. I understand that the issue occurs because there is a mismatch between the semantics of the query {a: null} and the index: {a: null}. Some workarounds I tried that did not work (under the assumption that the key a contains either null or a number):
querying in a way that matches the semantics of the index: {a: '$in': [null, []]}It's not clear to me why, but this still results in a FETCH+filter stage that filters even after the index is used to select documents indexes as {a: null}
Using a partial index: createIndex({a: 1}, {partialFilterExpression: {a: null}}) or alternatively (with a matching query): createIndex({a: 1}, {partialFilterExpression: {a: {'$type': 10}}})
None of these workarounds work. Is there any alternative I'm not seeing here to use covered queries involving null values? | |||||||||||
| Comment by Andrey Hvetskovich [ 18/Sep/19 ] | |||||||||||
|
Any plans on resolving this issue? | |||||||||||
| Comment by Kay Agahd [ 05/Jun/19 ] | |||||||||||
|
so you may add "depends on SERVER-12869" | |||||||||||
| Comment by David Storch [ 05/Jun/19 ] | |||||||||||
|
asya, the meaning of $eq:null predicates is to match documents where the path is either missing or contains a literal null value. Therefore, I don't think it makes sense to resolve this as a duplicate of SERVER-12869. They are related, but not precisely the same. Additionally, this ticket describes the problem whereas SERVER-12869 describes a particular set of code changes that wouldn't necessarily resolve this problem. I'm reopening this ticket. | |||||||||||
| Comment by Asya Kamsky [ 03/Jun/19 ] | |||||||||||
|
This issue can only be resolved by implementing/resolving SERVER-12869 since current version of MongoDB indexes stores the same value for null and missing values, and to be able to cover queries that include $eq to null predicate clause and return that field, the index would have to store different values for the two. Closing as a duplicate. | |||||||||||
| Comment by Ramon Fernandez Marina [ 09/Oct/15 ] | |||||||||||
|
andrey.hohutkin@gmail.com, I realized we moved this ticket into "Needs Triage", but we never posted a public comment: this ticket will be considered in a future planning round. There's some ongoing internal discussion about semantics and their implications for existing apps. Updates will be posted to this ticket as they happen. Regards, |