[SERVER-13963] Different Results from nested array query Created: 16/May/14  Updated: 08/Feb/23  Resolved: 20/May/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0, 2.6.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Neil Lunn Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-15727 two-dimension Array $elemMatch bug Closed
Related
is related to DOCS-3434 Doc nested array behavior change from... Closed
is related to SERVER-15266 $elemMatch/$all behavior in version 2... Closed
Operating System: ALL
Steps To Reproduce:

The following issue was raised on Stack Overflow in this question.

The data has been set up like this:

 
db.posts.insert({sents:
    [
        {uni:['a','b','c'],bi:[['a','b'],['b','c']]},
        {uni:['x','y','z'],bi:[['x','y'],['y','z']]}
    ]})
 

The following query when issued in 2.4 versions returns the desired result:

db.posts.find({'sents.bi':{'$elemMatch':{'$in':['a']}}})

and result:

{ 
    "_id" : ObjectId("537595f254bae6dfabddf0c9"), 
     "sents" : [     
         { 
             "uni": [ "a", "b", "c" ],
             "bi": [[ "a", "b" ],  [ "b", "c" ] ] 
         },
         {
             "uni": [ "x", "y", "z" ],  
             "bi": [ [ "x",    "y" ],  [ "y", "z" ] ] 
         } 
     ] 
}

The same query issued in 2.6 does not return any result.

This is questionable as I believe the correct form of the query should be:

db.posts.find({'sents.bi':{'$elemMatch': {$elemMatch: {$in: ['a']}}}})

Which will return the desired result in both versions.

So not sure if the selection is a bug that was present in earlier releases or whether there is something causing that statement to fail in 2.6 releases.

As a side note the question itself refers to creating an index on the collection:

db.posts.ensureIndex({ 'sents.bi'})

Which causes the query to fail in earlier versions. This I believe to be correct due to the nested array than cannot be used as in index value.

The second form of the query will not directly select an index unless that has been hinted in some way.

In 2.6 a hint to that index will produce an error and a sort simply ignores the index. In 2.4 the index will be shown as selected in the explain output, but there will be no bounds selected.

Participants:

 Description   

Querying elements within a nested array structure works differently in 2.6 releases than it does in prior 2.4 releases



 Comments   
Comment by David Storch [ 20/May/14 ]

To be documented in DOCS-3434.

Comment by David Storch [ 20/May/14 ]

Hi all,

The observed behavior change is actually a bug in the 2.4.x versions fixed in 2.6.0. Here's a quick proof that 2.4 was wrong:

> db.posts.drop()
 
> db.posts.insert({sents:
...     [
...         {uni:['a','b','c'],bi:[['a','b'],['b','c']]},
...         {uni:['x','y','z'],bi:[['x','y'],['y','z']]}
...     ]})
 
> db.posts.find({'sents.bi':{'$elemMatch':{'$in':['a']}}}).count()
1 // With no indices, the one document in the collection is returned.
 
> db.posts.ensureIndex({'sents.bi': 1})
> db.posts.find({'sents.bi':{'$elemMatch':{'$in':['a']}}}).count()
0 // With an index, nothing is returned.

A query should always return the same logical result set regardless of the set of indices that are available. 2.4 breaks this invariant, returning different results depending on whether or not the query can be indexed.

This begs the question, what should the result of this query be? Should it return the document or not? I would argue that the answer is "no, it should not return the document". Let's break down the query bit-by-bit. First consider the outer piece,

db.posts.find({'sents.bi': {'$elemMatch': ... }});

Translated from MongoDB query language to English, this roughly means "the field 'sents.bi' is an array which has an element that matches whatever conditions are inside the $elemMatch.". The document indeed has two arrays with the path 'sents.bi', these being

bi: [ ['a','b'], ['b','c'] ]
bi: [ ['x','y'], ['y','z'] ]

In order for the document to match, at least one of the arrays must match the part of the query inside the $elemMatch:

'$in': ['a']

This will only match if one of the arrays has the element 'a'. The array elements are the nested arrays ['a','b'], ['b','c'], ['x','y'], and ['y','z'], and none of these nested arrays are equal to 'a'. The key point here is that MongoDB does not reach recursively into nested arrays either for matching documents or for generating index keys. As an example, see the following shell session run against a 2.4.10 server:

> db.coll.drop()
> db.coll.insert({a: [[1,2], [3,4]]})
> db.coll.find({a: 1}).count()
0 // The document does not match because the outermost array does not have 1 as en element.
> db.coll.find({a: [1,2]}).count()
1 // The document matches because the outermost array has the nested array [1,2] as an element.

Furthermore, note that the index keys for this document when there is an index {a: 1} are not the integers 1, 2, 3, and 4. Rather, there are two index keys: the array [1,2] and the array [3,4].

I hope this helps to clear up the behavior as it stands in the 2.6.x series of production releases. I'm going to resolve this ticket, but will get the process rolling with our documentation team in order to ensure that this appears clearly in the docs. Feel free to reach out with any further questions.

Best,
Dave

Comment by 耀星 张 [ 16/May/14 ]

The hint doesn't give me any error. And it gives me the following execution plan:

db.posts.find({'sents.bi':{'$elemMatch': {$elemMatch: {$in: ['a']}}}}).hint({"sents.bi": 1}).explain()
{
    "cursor" : "BtreeCursor sents.bi_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 5,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 5,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "sents.bi" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "YX-T420S:27017",
    "filterSet" : false
}

however, it still scanned the other object in the collection that doesn't event have a "sents" property. Is it expected behavior?

Generated at Thu Feb 08 03:33:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.