[SERVER-13100] Large $in queries are several orders of magnitude slower Created: 07/Mar/14  Updated: 11/Jul/16  Resolved: 12/Mar/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0-rc1
Fix Version/s: 2.6.0-rc2

Type: Bug Priority: Blocker - P1
Reporter: Ben Becker Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File in_query.cpp     PDF File in_query.pdf     File in_query.svg     File in_query_100_unindexed.svg     File server13100.js    
Issue Links:
Related
related to SERVER-13139 Performance regression with smaller $... Closed
is related to SERVER-1211 don't use simplifiedQuery() to negate... Closed
Operating System: ALL
Steps To Reproduce:

build/run attached cpp file.

Participants:

 Description   

Running the attached client as follows:

ben@zzyzx:~/projects/mongo-utils/build$ ./in_query localhost:27249 100000
Total query time in miliseconds: 389
Doc count: 100000
ben@zzyzx:~/projects/mongo-utils/build$ ./in_query localhost:27261 10000
Total query time in miliseconds: 2103
ben@zzyzx:~/projects/mongo-utils/build$ ./in_query localhost:27261 100000
Total query time in miliseconds: 380121
Doc count: 100000

The last execution never returns a result, and the client is blocked in the call to DBClientCursor::more(). Note that the server logs the slow query at ~230ms on my machine, but the process continues to consume 100% cpu for 380 seconds.



 Comments   
Comment by Githook User [ 12/Mar/14 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-13100 replaced linear search in IndexBoundsChecker::findIntervalForField with std::lower_bound
Branch: master
https://github.com/mongodb/mongo/commit/b28a5a869a0e02d73e25cd39c15adf86c383f319

Comment by Githook User [ 12/Mar/14 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-13100 added test cases for IndexBoundsChecker::findIntervalForField. moved intervalCmp out of header.
Branch: master
https://github.com/mongodb/mongo/commit/79624f53393bc94cff490153f6bc6939ccd076c8

Comment by hari.khalsa@10gen.com [ 07/Mar/14 ]

Ah, so the faster run wasn't indexed. Great, that clears everything up.

Comment by Ben Becker [ 07/Mar/14 ]

Hi Hari/All,

FYI, when I add an ascending index to the 'a' field, performance is the same as '_id'.

I reran without the index, but perftools doesn't have a chance to sample much. So I added a loop to the test's main function which calls query() 100 times. SVG is attached; see in_query_100_unindexed.svg.

ben@zzyzx:~/projects/mongo-utils/build$ make && ./in_query localhost:27261 100000
...
Total query time in miliseconds: 21548
Doc count: 10000000

Happy to run any additional tests – just say the word.

Comment by hari.khalsa@10gen.com [ 07/Mar/14 ]

PS. Ben, can you post a SVG of the profile when it runs with 'a' instead of _id? Not sure about why that's different. Perhaps also the .cpp you used to generate that SVG as well.

Comment by hari.khalsa@10gen.com [ 07/Mar/14 ]

I think I know what's going on.

Each $in clause specifies a new allowed interval for the index key. When we move to a new index key we have to make sure it's in our set of allowed values. If our set of bounds has many intervals we can spend a lot of time checking that it's in our bounds. Each value for $in is another interval.

We can fix this in two ways:

1. Make the "is the value for this field in our bounds for that field" faster. Right now it does a linear scan through a sorted list which is generally small except in cases like this. We can binary search. See here:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/index_bounds.cpp#L512

2. If we have absurdly sized $in clauses, don't index them. Absurd is, say, >N clauses where N is some made-up number so most people aren't impacted. We can do this here (check for MATCH_IN and check the # of clauses):
https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/planner_ixselect.cpp#L157

Comment by Ben Becker [ 07/Mar/14 ]

Done.

Comment by Andy Schwerin [ 07/Mar/14 ]

benjamin.becker, can you please attach the SVG file of the profile?

Comment by Ben Becker [ 07/Mar/14 ]

If you change the query field to 'a' instead of _id (line 18), the response time is much better; in fact faster than v2.4:

ben@zzyzx:~/projects/mongo-utils/build$ ./in_query localhost:27261 100000
Total query time in miliseconds: 223
Doc count: 100000

Generated at Thu Feb 08 03:30:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.