[SERVER-5063] $in on first compound key element and limit efficiency Created: 24/Feb/12  Updated: 28/Oct/15  Resolved: 09/Apr/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 2.1.1

Type: Improvement Priority: Major - P3
Reporter: Eliot Horowitz Assignee: Aaron Staple
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
is depended on by SERVER-3310 Query optimizer should efficiently ha... Closed
related to SERVER-5481 query on undefined field matches empt... Closed
related to SERVER-5777 Add support for a range constraint on... Closed
related to SERVER-5450 refactor and optimize FieldRangeVecto... Closed
related to SERVER-6268 report both winning plan nscanned and... Closed


If you have an index on ( a , b ) and do a query like find( { a :

{ $in : [ ... }

} ).sort(

{ b : 1 }

an easy optimization is limiting the results for each $in element to the limit.

Comment by auto [ 07/Apr/12 ]


{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-5063 $in limit efficiency.
Branch: master

Comment by James Smith [ 09/Apr/12 ]

Out of interest, what was the speed difference for $in queries of this type after this change, do you have any benchmark results?

Comment by Aaron Staple [ 09/Apr/12 ]

Hi James,

The performance improvement you'll see will be very dependent on your data distribution, and it's possible to generate a benchmark with an arbitrary improvement multiple by varying the amount of data in your test. (For example I just ran a simple test for this optimization that showed a four order of magnitude performance improvement.)

Is there a real world example data set you'd like to test?

Comment by Artur Rodrigues [ 30/Jun/12 ]

I would just like to share my experience, where I have the following indexes:

"indexSizes" : {
"id" : 9026304,
"created_at_-1" : 6344576,
"feelings_1_location.state_1_created_at_-1" : 10285408

The query:

{ $in: [ 'cansado', 'triste' ] }

'location.state' :

{ $in: [ 'Minas Gerais', 'Rio de Janeiro' ] }


{'created_at': -1}


In v2.0.6 the results were:

while in v2.1.2 there was a significant increase in the number of objects scanned:

Even if I hint in v.2.0.6 the same index used in v2.1.2, it still scans less documents:

Comment by Aaron Staple [ 30/Jun/12 ]

Hi Artur,

I think you may be encountering a difference the explain behavior between 2.0 and 2.1. In 2.1 the nscanned reported for the overall query includes the nscanned values of all candidate plans attempted. In 2.0 only the nscanned of the "winning" query plan is reported.

Could you please send the output of:

db.dev06.find( ... ).explain( true )
db.dev06.find( ... ).hint(

{ feelings:1, 'location.state':1, created_at:-1 }

).explain( true )

for both 2.0 and 2.1? Also, it would be helpful for clarity if you could include the command you enter into the shell with the shell output you post.


Comment by Artur Rodrigues [ 30/Jun/12 ]

Hi Aaron,

Thanks for your prompt reply. Please find the information you asked for:


Indeed, as you said, v2.1 is reporting the nscanned values of all candidates. My fault for not reading the changelog throughly.

Anyways, I was expecting a performance boost in v2.1 for this specific query given the compound index, but I think I'll have to wait until SERVER-3310 is solved.

Thank you once again!

Comment by Aaron Staple [ 30/Jun/12 ]

Hi Artur,

No problem. It looks like your 'feelings_1_location.state_1_created_at_-1' index is multikey, which would explain why the optimization is not being used. (It is not supported for multikey indexes.)

Generated at Sun Jan 20 14:57:45 UTC 2019 using Jira 7.12.1#712002-sha1:609a50578ba6bc73dbf8b05dddd7c04a04b6807c.