[SERVER-7916] Multikey index performance regression Created: 12/Dec/12 Updated: 15/Feb/13 Resolved: 04/Jan/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | 2.2.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Arkadiy Kukarkin | Assignee: | Aaron Staple |
| Resolution: | Done | Votes: | 1 |
| Labels: | SRL | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 11.04 |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
After upgrading to 2.2.2 (and php driver 1.3.1) we've started noticing some odd timeouts. We use a multikey index on a set of "digested" words as a pseudo-fulltext search, and then query with $all : [ ... ], which has worked very well. After the upgrade, the query optimizer doesn't seem to want to use those indexes in many cases when combined with sort, which results in awful nscanned:
If we provide a hint, performance is better, but still not great because only the first term inside $all appears to be used for the index. I'm not sure if the previous behavior would have been to use both (i.e. find both sets then compute intersection), but performance used to be much better.
If we manually reorder the terms inside $all, the result is pretty good, similar to past performance:
Finally, the behavior of the optimizer seems unpredictable, for example some (absolutely catastrophic) slow queries logged like this:
behave reasonably if reproduced exactly via the mongo shell:
|
| Comments |
| Comment by Aaron Staple [ 04/Jan/13 ] | |||||||||||||||||||||
|
Thanks Arkadiy. Let us know if you have any other questions. | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 04/Jan/13 ] | |||||||||||||||||||||
|
Aaron, thank you for the excellent answer, clears a lot of things up. I can accept this as "works as designed" edge case. | |||||||||||||||||||||
| Comment by Aaron Staple [ 04/Jan/13 ] | |||||||||||||||||||||
|
Hi Arkadiy, I think there are three separate questions in your initial post. I'll try to address them individually: 1) For query db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] }} ).sort( { last_seen: -1 }).limit(500).explain(), why does the query optimizer use the {ftx:1}index on 2.0 but the {last_seen:-1}index on 2.2? I can't be sure without reproducing from your data set, but my guess is this change is caused by the following: You may be aware that when mongo runs a query and there are multiple candidate query plans available, it tries all the candidate plans in an interleaved manner and selects the one it considers "best" <http://docs.mongodb.org/manual/core/read-operations/#query-optimization>. This procedure of trying out different candidate plans may stop when a certain number of results have been seen from a given candidate query plan. I should introduce the concept of an ordered vs unordered query plan. An ordered query plan is one that produces results consistent with the order of your sort spec. In your example, because your sort order is last_seen:-1, the query plan on the {last_seen:-1}index is ordered. An unordered query plan is one that produces results not consistent with the order of your sort spec. In your example, because your sort order is last_seen:-1, the query plan on the {ftx:1}index is unordered because ftx:1 ordering is not the same as the last_seen:-1 ordering. As mentioned above, when there are multiple candidate plans for a query, results are retrieved from them in an interleaved manner. If some plans are ordered and some are unordered, then all results must be retrieved from an unordered plan for it to be declared the "winner". However, for ordered plans, a plan is declared a "winner" after a particular number of matches are identified for that query plan. In 2.0, the number of matches needed to select an ordered query plan as the "best" was always equal to the query batch size. So in your example, the ordered last_seen:-1 plan would be chosen as the "best" after 500 matches were found for that plan because the batch size for your query is 500. In 2.2, the behavior was changed so that an ordered query plan is chosen as the "best" after 101 matches are found for that query plan. This means the last_seen:-1 query plan will be chosen if 101 matches are seen for that plan before all matches are seen for the unordered ftx:1 paln. To summarize, in 2.0 your ordered last_seen:-1 index would be chosen after 500 matches. But in 2.2 it would be chosen after only 101 matches. So the last_seen:-1 index will be chosen over the ftx:1 index in more cases in 2.2. This behavior change from 2.0 is essentially "works as designed" for now. It was part of the specced behavior of If you aren't getting good performance with the automatically selected index, I suggest using a hint. 2) Does a $all query do an index intersection? $all has never been implemented as an index intersection, but adding this optimization is 3) For query db.a_ftx.find( { ftx: { $all: [ "hamilton", "jo" ] }} ).sort( { last_seen: -1 }).limit(500).explain(), why is performance slower when running an explain on the shell? I suspect the reason for this has to do with another aspect of how the query optimizer works, relating to the query plan cache. As I mentioned in #1, when multiple candidate query plans are available for a query, mongo will simultaneously run each of them in an interleaved manner until one is chosen as a winner. The most selective plan is then recorded in a cache of query plans. When another query comes in that is similar to the query that already ran, the query plan from the plan cache is used to perform the query instead of attempting all candidate plans. However, when you do an "explain" query, the query plan cache is not used. Instead, all candidate query plans are evaluated from scratch. I suspect that the logged slow query resulted because the {last_seen:-1}index was recorded in the query plan cache for this type of query, and then reused for the logged query resulting in a slow query. When you ran explain, it tried both the {last_seen:-1}and {ftx:1}indexes and chose the {ftx:1}index. As with #1, if your desired index is not being chosen automatically I suggest using a hint. | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 02/Jan/13 ] | |||||||||||||||||||||
|
Hi Stephen, anything on this? We're not seeing many instances of queries not using the index entirely, but this still seems like a regression in the query optimizer. | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 17/Dec/12 ] | |||||||||||||||||||||
|
Hi Stephen, I can confirm that the first query correctly (more or less) uses the ftx index in 2.0.4:
(ignore the high millis count, this is on a very small EC2 instance) I can't reproduce the case where the index is not used under pre-2.2 Prior to the upgrade, the machines were 2.0.5 (maybe one or two still on 2.0.4) and php driver was at 1.2.12 | |||||||||||||||||||||
| Comment by Stennie Steneker (Inactive) [ 14/Dec/12 ] | |||||||||||||||||||||
|
Hi Arkadiy, Can you confirm the MongoDB server and PHP driver versions you were using prior to upgrading to 2.2.2 server / 1.3.1 php driver? Would it be possible for you to provide the results of the same explain queries with the older server version? Thanks, | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 13/Dec/12 ] | |||||||||||||||||||||
|
Ian, do you want me to submit anything to help reproduce? | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 12/Dec/12 ] | |||||||||||||||||||||
|
Also, it looks like the second case (using 'ftx' index with 'the') is actually slower than the first (using 'last_seen') but this is likely due to other load/ops on the server. | |||||||||||||||||||||
| Comment by Arkadiy Kukarkin [ 12/Dec/12 ] | |||||||||||||||||||||
|
The latter behavior particularly is really killing us, because some % of searches perform a full collection scan, which is very slow and yields hundreds of times. |