[SERVER-8610] $elemMatch (objects in array) not using index range correctly Created: 19/Feb/13  Updated: 27/Nov/17  Resolved: 26/Feb/13

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.2.3, 2.4.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eduardo Manso Assignee: Aaron Staple
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows Server 2008 R2


Issue Links:
Depends
Duplicate
is duplicated by SERVER-15086 Allow for efficient range queries ove... Closed
Operating System: Windows
Participants:

 Description   

Let's say I have a collection of 100.000 objects like this:

{"user":"bill","created":"2013-10-10","updates":[

{"uDate":"2013-10-10",...}

,

{"uDate":"2013-10-11",...}

]}

And the following index that should be used on my queries:

{"updates.uDate":1}

The query:

db.users.find({"updates":{$elemMatch:{"uDate":{$gte:"2013-10-10",$lt:"2013-10-12"}}}}).count();

if I try to query this way, the results are correct but the usage of index is not, that's why it takes so much time to get results.
This has a total of 559 results, what is correct, but to try to understand the delay on this query, I tried:

db.users.find({"updates":{$elemMatch:{"uDate":{$gte:"2013-10-10",$lt:"2013-10-12"}}}}).explain();

So I got:

{
"cursor" : "BtreeCursor userDate",
"isMultiKey" : true,
"n" : 559,
"nscannedObjects" : 434513,
"nscanned" : 434513,
"nscannedObjectsAllPlans" : 434513,
"nscannedAllPlans" : 434513,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 3,
"nChunkSkips" : 0,
"millis" : 4547,
"indexBounds" : {
"updates.uDate" : [
[
"2013-10-10",
{

}
]
]
},
"server" : "fzdv1:27017"
}

After thinking a lot, I could not realise why the index does not have an End (2013-10-12), just a Start (2013-10-10).. a RANGE.
It could be because it's a MultiKey index, so what about forcing the index to work as expected:

db.users.find().min(

{"updates.uDate":"2013-10-10"}

).max(

{"updates.uDate":"2013-10-12"}

).hint(

{"updates.uDate":1}

).explain();

{
"cursor" : "BtreeCursor userDate",
"isMultiKey" : true,
"n" : 559,
"nscannedObjects" : 559,
"nscanned" : 560,
"nscannedObjectsAllPlans" : 559,
"nscannedAllPlans" : 560,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 9,
"indexBounds" : {
"start" :

{ "updates.uDate" : "2013-10-10" }

,
"end" :

{ "updates.uDate" : "2013-10-12" }

},
"server" : "fzdv1:27017"
}

Perfect..it worked really fast and with a correct result of 559 objects. My "workaround" would solve my problem but not yet.
What I need is to use aggregate function so that I can $unwind the "updates.uDate".
Using $match produces the same index error/delay and, as far as I know, there is no way to force index range ("hint"/"min"/"max") on an aggregate $match.

Because of that, my only solution is to break my users collection into two, "users" and "users_updates". Then I could use something like:

db.users.find({"created":{$gte:"2013-10-10",$lt:"2013-10-12"}}).explain();

{
"cursor" : "BtreeCursor created",
"isMultiKey" : false,
"n" : 126,
"nscannedObjects" : 126,
"nscanned" : 126,
"nscannedObjectsAllPlans" : 126,
"nscannedAllPlans" : 126,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 8,
"indexBounds" :

{ "created" : [ [ "2013-10-10", "2013-10-12" ] ] }

,
"server" : "fzdv1:27017"
}

Why $elemMatch queries only works with indexes (range - start/end) as expected when forced to? Why querying objects in an array of objects does not uses indexes as normal queries since it is technicaly possible, as shown above?

Here is the query that would solve my problem if it worked properly with its index range (start/END).

db.users.aggregate(
{$match : {"updates":{$elemMatch:{"uDate":{$gte:"2013-10-10",$lt:"2013-10-12"}}}}},
{$unwind : "$updates"},
{$project : {"updates.uDate":1}},
{$match : {"updates.uDate":{$gte:"2013-10-10",$lte:"2013-10-12"}}},
{$sort : {"updates.uDate":1}}
);

Am I missing something or it's a real issue?

Thanks.



 Comments   
Comment by Aaron Staple [ 26/Feb/13 ]

Eduardo,

You have proposed that the query { a:{ $elemMatch:{ b:

{ $gte:1, $lte:10 }

} } } use the index bounds between 1 and 10 on index

{ 'a.b':1 }

. This is functionally incorrect under the current query semantics. For example the query { a:{ $elemMatch:{ b:

{ $gte:1, $lte:10 }

} } } has been defined to match the document { a:[

{ b:[ 0, 11 ] }

] }. As you can see, one value in the 'a' array contains both a 'b' value >= 1 and a 'b' value <= 10. Applying index bounds on 'a.b' between 1 and 10 would fail to match this document.

The plan of record for handling this situation remains SERVER-6050.

Comment by Eduardo Manso [ 20/Feb/13 ]

Ok, I've tried your query and it works perfect, the result is correct and it uses the index as expected, but that's a weird workaround, don't you think so?

"upDate" is a property that is supposed to hold one date, only one, turning it into an array would be a workaround, not a plausible solution.

Anyway, that's not the point, I'll use the SERVER-6050 example to get to it.

According to your suggestion, I should wait for the possible "SERVER-6050 improvement" but, in my opinion, SERVER-6050 is neither a workaround nor an improvement, it's a mistake.

The $elemMatch tag, as it's name says, is intended to match an element in an array, so making "$elemMatch" working with anything different from a ARRAY would be a mistake.

Let's get SERVER-6050 example:

document - { a:[

{ b:5 }

,

{ b:15 }

] }
query - { 'a.b':{ $elemMatch:

{ $gte:1, $lte:10 }

} }

"b" is not an array, so it's pretty understandable why this query doesn't work, and it really should not.

So, let's use the logic here. "a" is an array, so, keeping our logic, the correct query would be:

query - { 'a':{ $elemMatch:{ "b":{$gte:1, $lte:10} } } }

Now $elemMatch is related to "a" and the condition

{gte/lte}

is related to "b".

Applying the "SERVER-6050 improvement" does not make any sense, but, using the right query does, and, in fact, this query already WORKS, and THAT'S THE POINT! It works, the result is correct, but it is slow because it not uses the index RANGE as it should. It finds and uses the correct index

{"a.b":1}

, it applies the "b:{$gte:1}" condition on it and, for some crazy and not logical reason, it forgets to apply {b:{lte:10}} condition when it's scanning the index.

Resuming, my question is:

What is the most plausible step now? Make mongo works with this weird query { 'a.b':{ $elemMatch:

{ $gte:1, $lte:10 }

} }, or fix the index issue on an query that seems to be more correct { 'a':{ $elemMatch:{ "b":{$gte:1, $lte:10} } } }, and the most important thing, that already works!!

What I am asking here is not for a new feature or a workaround, I'm asking for an "improvement" or a "fix" on how index works with {b:{$gte:1, $lte:10}} when "b" is an element of an array. In my point of view, since $elemMatch is explicitly assigned to "a", any queries to "b" (an element of "a") should behave like it was independent, not matter if is on root or inside an array, and, as I've had proved before, it is technicaly possible as shown bellow:

db.xxx.find().min(

{"a.b":"2013-10-10"}

).max(

{"a.b":"2013-10-12"}

).hint(

{"a.b":1}

).explain();

Another good improvement could be allowing the above query to work on an aggregate function (hint/min/max).

Thanks

Comment by Aaron Staple [ 19/Feb/13 ]

Hi Eduardo - I think your usage of $elemMatch is a little different from what you want.

For this to work you'll need to change your uDate fields to arrays and use this modified query. Note that if SERVER-6050 is implemented uDate will no longer need to be an array.

Here's a demo:

 = db.c;
c.drop();
 
// Note that the uDate dates are now contained in arrays.                             
c.save( {"user":"bill","created":"2013-10-10","updates":[ {"uDate":["2013-10-10"]} , \
{"uDate":["2013-10-11"]} ] } );
 
c.ensureIndex( {"updates.uDate":1} );
 
// This is the query from the ticket.                                                 
printjson( c.find({"updates":{$elemMatch:{"uDate":{$gte:"2013-10-10",$lt:"2013-10-12"\
}}}}).explain() );
 
// This is a query which will provide desired index bounds.                           
printjson( c.find({"updates.uDate":{$elemMatch:{$gte:"2013-10-10",$lt:"2013-10-12"}}}\
).explain() );
 

Generated at Thu Feb 08 03:17:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.