[SERVER-27442] Positional path component in match language should not act as both an array index and a field name for a single document Created: 16/Dec/16  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Styvane Soukossi Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: index-version, mql-semantics, query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Fix_for_coll_scan.patch    
Issue Links:
Depends
is depended on by SERVER-55739 Incorrect filter result when using a ... Closed
Duplicate
is duplicated by SERVER-14877 Matcher incorrect on null equality qu... Closed
is duplicated by SERVER-64221 Paths with numeric components in fore... Closed
Related
related to SERVER-47935 findOneAndUpdate and update insert ob... Closed
is related to SERVER-14877 Matcher incorrect on null equality qu... Closed
is related to SERVER-31876 $ne has inconsistent behavior Closed
is related to SERVER-58521 Dot notation updates with arrays can ... Closed
is related to SERVER-67379 Accessing array element with invalid ... Closed
Assigned Teams:
Query Optimization
Operating System: ALL
Participants:
Case:

 Description   
Issue Status as of April 29, 2020

ISSUE SUMMARY

Mongo Query Language (MQL) allows referencing both subfields and array positional elements in a convenient but ambiguous way. When the data model varies between array elements or when field names inside of arrayed subdocuments are numbers (e.g., "2"), this can make it difficult to:

  • check for the existence or non-existence of an array element.
  • filter on values at a specific position in an array

When querying for numeric field paths on arrays that contain subdocuments, more results are returned than may be expected.

For example, the query {"a.2": 99} matches all of:

{"a": { "2": 99}}
{"a": [1, 32, 99]}
{"a" : [{ "2" : 99}, 1, 2]}

Note how {"a": [{"2": 99}, 1, 2]} matches because a.2 is 99, independent of the subdocument's position in the array.

The following documents would all match {"a.2": {$eq: null}}:

{"a": [{"2":null}]}
{"a": [{}]}
{"a": [1, 2, {}]}

That is, a document matches the {"a.2": {$eq: null}} query if ANY of the following are true:

  • the second element of the a array does not exist
  • the second element of the a array exists and is explicitly null
  • ANY document in the a array does not have a "2" field
  • ANY document in the a array has a 2 field that is explicitly null.

There isn't currently a way to disambiguate this behavior without backwards-breaking changes to MQL, but this issue is under investigation.

WORKAROUNDS

If an array will contain any subdocuments, use caution when storing field names that are numbers. Consider avoiding the combination of subfield queries and positional queries on arrays. As always, test that query results are correct for your application's logic.

If it is necessary for application logic to mix both query types, and unexpected results occur, the following workarounds are available:

  • If testing the existence of fields in array subdocuments, use $elemMatch even though it is technically not supposed to be necessary for query filters on single array fields.
  • If querying array elements by position, use the aggregation framework to take advantage of the $arrayElemAt operator.

AFFECTED VERSIONS

This ambiguity affects all release versions of MongoDB.

Original description

Suppose we have the following documents in our collection:

{
     "_id" : ObjectId("585399ed6ba942932efdc4ce"), 
     "item" : [ 1, 2, 3, null ]
},
{
     "_id" : ObjectId("585399ed6ba942932efdc4cf"), 
    "item" : [ 4, 5, 6, 7 ] 
},
{
    "_id" : ObjectId("58539a3c6ba942932efdc4d0"), 
   "item" : [ 8, 9, 10, { "another_item" : 11 } ] 
}

The following query:

db.coll.find({"item.3": null})

yields:

{
     "_id" : ObjectId("585399ed6ba942932efdc4ce"), 
    "item" : [ 1, 2, 3, null ] 
}
{ 
    "_id" : ObjectId("58539a3c6ba942932efdc4d0"), 
    "item" : [ 8, 9, 10, { "another_item" : 11 } ] 
}

As we can see the document with sub-document in the array at index 3 is returned as well.

Of course to get the expected result, one has to use the `$type` operator to match the BSON type.

Is this the expected result? If yes can we add it to the documentation?

source: MongoDB not querying `null` within a array at a given posision correctly



 Comments   
Comment by Asya Kamsky [ 21/May/19 ]

I believe this can be solved only with a breaking change to the language (one that disambiguates array index reference from field name reference).

Comment by Asya Kamsky [ 19/Mar/19 ]

darush please note that SERVER-27646 fixed the index related issue, so this ticket is tracking the "strange" semantics of the query language when the field path contains a number (and it can refer to position or to numeric field name).

EDIT never mind, I just realized that this is slightly different issue.

 

Comment by David Storch [ 13/Mar/19 ]

I've investigated this issue, and this is not an implementation flaw, but rather a bizarre consequence of the semantics that the MongoDB Query Language has chosen for positional path components such as the "2" in the field path "a.2". This is clearly undesirable behavior, so I hesitate to say this is "not a bug", but the fix would involve changing the semantics of the language, and thus could be viewed as backwards breaking.

To demonstrate the unusual semantics of positional path components, take the following example:

MongoDB Enterprise > db.c.drop()
true
MongoDB Enterprise > db.c.insert({_id: 1, a: [1, 2, 3, {"2": 99}]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.insert({_id: 2, a: [1, 2, 99, {"2": 3}]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.find({"a.2": 99})
{ "_id" : 1, "a" : [ 1, 2, 3, { "2" : 99 } ] }
{ "_id" : 2, "a" : [ 1, 2, 99, { "2" : 3 } ] }

The document with _id:2 matches in the way that users might expect: it has the queried-for value of 99 in the third position of the array "a", clearly pointed at by the path "a.2". However, in addition to selecting the array element in the third position, the path "a.2" has another meaning. This is demonstrated by the other matching document with _id:1. If the array "a" contains any nested documents, then "2" will be traversed as though it were a regular field name. Thus, the document {"_id" : 1, "a" : [1, 2, 3, { "2" : 99 }]} matches the query {"a.2": 99} by analogy to the document {a: [{b: 99}]} matching the query {"a.b": 99}.

Of course, things get just a bit more surprising when it comes to equality-to-null queries. Equality-to-null queries have non-existence semantics. That is, {a: {$eq: null}} matches documents where either the field "a" contains a literal null, or the field "a" does not exist. These existence semantics apply in the case of implicit array traversal as well:

MongoDB Enterprise > db.c.drop()
true
MongoDB Enterprise > db.c.insert({a: [{c: 1}]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.find({"a.b": {$eq: null}})
{ "_id" : ObjectId("5c894cebc9b795c8f866b8ec"), "a" : [ { "c" : 1 } ] }

In this example, the document matches because the field path "a.b" does not exist. If we use the string "2" instead of "b", the same principle holds:

MongoDB Enterprise > db.c.find({"a.2": {$eq: null}})
{ "_id" : ObjectId("5c894cebc9b795c8f866b8ec"), "a" : [ { "c" : 1 } ] }

And the same principle continues to hold even if we add some non-null scalar elements to the beginning of the array:

MongoDB Enterprise > db.c.insert({a: [1, 2, 3, 4, {c: 1}]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.find({"a.2": {$eq: null}})
{ "_id" : ObjectId("5c894cebc9b795c8f866b8ec"), "a" : [ { "c" : 1 } ] }
{ "_id" : ObjectId("5c894d5ac9b795c8f866b8ed"), "a" : [ 1, 2, 3, 4, { "c" : 1 } ] }

To put the whole story together, for a path like "a.2", the positional component is treated both as an array index and a regular field name. Therefore, the query {"a.2": {$eq: null}} will match whenever "a" is an array containing a nested object where the nested object does not contain the field name "2". I'm going to move this ticket back into the "Needs Scheduling" state. We should decide whether it would be acceptable to make a breaking change to the semantics of MQL in order to fix this behavior.

CC asya kelsey.schubert ramon.fernandez

Comment by Darya Savelova [ 04/May/18 ]

Server version 3.6.4 still have this issue.

And if nullable array field is covered with index like this

db.getCollection("coll").createIndex({ "item.3": 1 })

query result is correct

 

> db.coll.find({"item.3": null})
{{ "_id" : ObjectId("585399ed6ba942932efdc4ce"), "item" : [ 1, 2, 3, null ]}

 

Comment by Kelsey Schubert [ 16/Dec/16 ]

Hi styvane,

Thanks for the report. We're investigating this behavior.

Kind regards,
Thomas

Generated at Thu Feb 08 04:15:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.