[SERVER-27442] Positional path component in match language should not act as both an array index and a field name for a single document Created: 16/Dec/16 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Styvane Soukossi | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | index-version, mql-semantics, query-44-grooming, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Issue Status as of April 29, 2020 ISSUE SUMMARY Mongo Query Language (MQL) allows referencing both subfields and array positional elements in a convenient but ambiguous way. When the data model varies between array elements or when field names inside of arrayed subdocuments are numbers (e.g., "2"), this can make it difficult to:
When querying for numeric field paths on arrays that contain subdocuments, more results are returned than may be expected. For example, the query {"a.2": 99} matches all of:
Note how {"a": [{"2": 99}, 1, 2]} matches because a.2 is 99, independent of the subdocument's position in the array. The following documents would all match {"a.2": {$eq: null}}:
That is, a document matches the {"a.2": {$eq: null}} query if ANY of the following are true:
There isn't currently a way to disambiguate this behavior without backwards-breaking changes to MQL, but this issue is under investigation. WORKAROUNDS If an array will contain any subdocuments, use caution when storing field names that are numbers. Consider avoiding the combination of subfield queries and positional queries on arrays. As always, test that query results are correct for your application's logic. If it is necessary for application logic to mix both query types, and unexpected results occur, the following workarounds are available:
AFFECTED VERSIONS This ambiguity affects all release versions of MongoDB. Original descriptionSuppose we have the following documents in our collection:
The following query:
yields:
As we can see the document with sub-document in the array at index 3 is returned as well. Of course to get the expected result, one has to use the `$type` operator to match the BSON type. Is this the expected result? If yes can we add it to the documentation? source: MongoDB not querying `null` within a array at a given posision correctly |
| Comments |
| Comment by Asya Kamsky [ 21/May/19 ] | ||||||||||||||||||||||
|
I believe this can be solved only with a breaking change to the language (one that disambiguates array index reference from field name reference). | ||||||||||||||||||||||
| Comment by Asya Kamsky [ 19/Mar/19 ] | ||||||||||||||||||||||
|
darush EDIT never mind, I just realized that this is slightly different issue.
| ||||||||||||||||||||||
| Comment by David Storch [ 13/Mar/19 ] | ||||||||||||||||||||||
|
I've investigated this issue, and this is not an implementation flaw, but rather a bizarre consequence of the semantics that the MongoDB Query Language has chosen for positional path components such as the "2" in the field path "a.2". This is clearly undesirable behavior, so I hesitate to say this is "not a bug", but the fix would involve changing the semantics of the language, and thus could be viewed as backwards breaking. To demonstrate the unusual semantics of positional path components, take the following example:
The document with _id:2 matches in the way that users might expect: it has the queried-for value of 99 in the third position of the array "a", clearly pointed at by the path "a.2". However, in addition to selecting the array element in the third position, the path "a.2" has another meaning. This is demonstrated by the other matching document with _id:1. If the array "a" contains any nested documents, then "2" will be traversed as though it were a regular field name. Thus, the document {"_id" : 1, "a" : [1, 2, 3, { "2" : 99 }]} matches the query {"a.2": 99} by analogy to the document {a: [{b: 99}]} matching the query {"a.b": 99}. Of course, things get just a bit more surprising when it comes to equality-to-null queries. Equality-to-null queries have non-existence semantics. That is, {a: {$eq: null}} matches documents where either the field "a" contains a literal null, or the field "a" does not exist. These existence semantics apply in the case of implicit array traversal as well:
In this example, the document matches because the field path "a.b" does not exist. If we use the string "2" instead of "b", the same principle holds:
And the same principle continues to hold even if we add some non-null scalar elements to the beginning of the array:
To put the whole story together, for a path like "a.2", the positional component is treated both as an array index and a regular field name. Therefore, the query {"a.2": {$eq: null}} will match whenever "a" is an array containing a nested object where the nested object does not contain the field name "2". I'm going to move this ticket back into the "Needs Scheduling" state. We should decide whether it would be acceptable to make a breaking change to the semantics of MQL in order to fix this behavior. | ||||||||||||||||||||||
| Comment by Darya Savelova [ 04/May/18 ] | ||||||||||||||||||||||
|
Server version 3.6.4 still have this issue. And if nullable array field is covered with index like this
query result is correct
| ||||||||||||||||||||||
| Comment by Kelsey Schubert [ 16/Dec/16 ] | ||||||||||||||||||||||
|
Hi styvane, Thanks for the report. We're investigating this behavior. Kind regards, |