[SERVER-5063] $in on first compound key element and limit efficiency Created: 24/Feb/12  Updated: 28/Oct/15  Resolved: 09/Apr/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 2.1.1

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz Assignee: Aaron Staple
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Depends
is depended on by SERVER-3310 Query optimizer should efficiently ha... Closed
Related
related to SERVER-5481 query on undefined field matches empt... Closed
related to SERVER-5777 Add support for a range constraint on... Closed
related to SERVER-5450 refactor and optimize FieldRangeVecto... Closed
related to SERVER-6268 report both winning plan nscanned and... Closed
Participants:

 Description   

If you have an index on ( a , b ) and do a query like find( { a :

{ $in : [ ... }

} ).sort(

{ b : 1 }

).limit(10)
an easy optimization is limiting the results for each $in element to the limit.



 Comments   
Comment by auto [ 07/Apr/12 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-5063 $in limit efficiency.
Branch: master
https://github.com/mongodb/mongo/commit/7cacde89bb37051549b38abaf4d6b591b199b243

Comment by James Smith [ 09/Apr/12 ]

Out of interest, what was the speed difference for $in queries of this type after this change, do you have any benchmark results?

Comment by Aaron Staple [ 09/Apr/12 ]

Hi James,

The performance improvement you'll see will be very dependent on your data distribution, and it's possible to generate a benchmark with an arbitrary improvement multiple by varying the amount of data in your test. (For example I just ran a simple test for this optimization that showed a four order of magnitude performance improvement.)

Is there a real world example data set you'd like to test?

Comment by Artur Rodrigues [ 30/Jun/12 ]

I would just like to share my experience, where I have the following indexes:

"indexSizes" : {
"id" : 9026304,
"created_at_-1" : 6344576,
"feelings_1_location.state_1_created_at_-1" : 10285408
},

The query:
db.dev06.find({
'feelings':

{ $in: [ 'cansado', 'triste' ] }

,
'location.state' :

{ $in: [ 'Minas Gerais', 'Rio de Janeiro' ] }

}).sort(

{'created_at': -1}

).limit(10).explain();

In v2.0.6 the results were:
http://pastie.org/private/frfkmfuli7uqi6fwiykag

while in v2.1.2 there was a significant increase in the number of objects scanned:
http://pastie.org/private/6dtbsxb6vofzh8yslmicca

Even if I hint in v.2.0.6 the same index used in v2.1.2, it still scans less documents:
http://pastie.org/private/ejgiavgf1reqlq5hwhgpw

Comment by Aaron Staple [ 30/Jun/12 ]

Hi Artur,

I think you may be encountering a difference the explain behavior between 2.0 and 2.1. In 2.1 the nscanned reported for the overall query includes the nscanned values of all candidate plans attempted. In 2.0 only the nscanned of the "winning" query plan is reported.

Could you please send the output of:

db.dev06.find( ... ).explain( true )
db.dev06.find( ... ).hint(

{ feelings:1, 'location.state':1, created_at:-1 }

).explain( true )

for both 2.0 and 2.1? Also, it would be helpful for clarity if you could include the command you enter into the shell with the shell output you post.

Thanks,
Aaron

Comment by Artur Rodrigues [ 30/Jun/12 ]

Hi Aaron,

Thanks for your prompt reply. Please find the information you asked for:
v2.0.6
http://pastie.org/private/dhsoanhnzlvgtkx71o9nq

v2.1.2
http://pastie.org/private/0a33rnyxxqeukdwbevurw

Indeed, as you said, v2.1 is reporting the nscanned values of all candidates. My fault for not reading the changelog throughly.

Anyways, I was expecting a performance boost in v2.1 for this specific query given the compound index, but I think I'll have to wait until SERVER-3310 is solved.

Thank you once again!

Comment by Aaron Staple [ 30/Jun/12 ]

Hi Artur,

No problem. It looks like your 'feelings_1_location.state_1_created_at_-1' index is multikey, which would explain why the optimization is not being used. (It is not supported for multikey indexes.)

Generated at Thu Jun 21 16:57:26 UTC 2018 using JIRA 7.8.2#78002-sha1:944b71ecbe2e09c23503821098ef280c785b44a8.