[SERVER-13100] Large $in queries are several orders of magnitude slower Created: 07/Mar/14 Updated: 11/Jul/16 Resolved: 12/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.6.0-rc1 |
| Fix Version/s: | 2.6.0-rc2 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Ben Becker | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | 26qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | build/run attached cpp file. |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Running the attached client as follows:
The last execution never returns a result, and the client is blocked in the call to DBClientCursor::more(). Note that the server logs the slow query at ~230ms on my machine, but the process continues to consume 100% cpu for 380 seconds. |
| Comments |
| Comment by Githook User [ 12/Mar/14 ] | ||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | ||||
| Comment by Githook User [ 12/Mar/14 ] | ||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | ||||
| Comment by hari.khalsa@10gen.com [ 07/Mar/14 ] | ||||
|
Ah, so the faster run wasn't indexed. Great, that clears everything up. | ||||
| Comment by Ben Becker [ 07/Mar/14 ] | ||||
|
Hi Hari/All, FYI, when I add an ascending index to the 'a' field, performance is the same as '_id'. I reran without the index, but perftools doesn't have a chance to sample much. So I added a loop to the test's main function which calls query() 100 times. SVG is attached; see in_query_100_unindexed.svg.
Happy to run any additional tests – just say the word. | ||||
| Comment by hari.khalsa@10gen.com [ 07/Mar/14 ] | ||||
|
PS. Ben, can you post a SVG of the profile when it runs with 'a' instead of _id? Not sure about why that's different. Perhaps also the .cpp you used to generate that SVG as well. | ||||
| Comment by hari.khalsa@10gen.com [ 07/Mar/14 ] | ||||
|
I think I know what's going on. Each $in clause specifies a new allowed interval for the index key. When we move to a new index key we have to make sure it's in our set of allowed values. If our set of bounds has many intervals we can spend a lot of time checking that it's in our bounds. Each value for $in is another interval. We can fix this in two ways: 1. Make the "is the value for this field in our bounds for that field" faster. Right now it does a linear scan through a sorted list which is generally small except in cases like this. We can binary search. See here: 2. If we have absurdly sized $in clauses, don't index them. Absurd is, say, >N clauses where N is some made-up number so most people aren't impacted. We can do this here (check for MATCH_IN and check the # of clauses): | ||||
| Comment by Ben Becker [ 07/Mar/14 ] | ||||
|
Done. | ||||
| Comment by Andy Schwerin [ 07/Mar/14 ] | ||||
|
benjamin.becker, can you please attach the SVG file of the profile? | ||||
| Comment by Ben Becker [ 07/Mar/14 ] | ||||
|
If you change the query field to 'a' instead of _id (line 18), the response time is much better; in fact faster than v2.4:
|