[SERVER-12639] Performance regression with RegEx Created: 06/Feb/14 Updated: 11/Jul/16 Resolved: 19/Feb/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.5.5 |
| Fix Version/s: | 2.6.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | rgpublic | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | 26qa | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
I recently tried to upgrade from 2.4.9 to 2.5.5 and I'm experiencing catastrophic regex query times with 2.5.5 so I had to downgrade again. My query simply is sth. like this: 2.4.9 gives me this query plan: 2.5.5 gives this query plan: This plan is then running for literally ages Unfortunately I cannot provide test data because I could only reproduce the problem with our live database which data is quite large and non-disclosable. Nevertheless here is the full explain plan for 2.5.5 in case you can discern anything: { ], ", , ] ", , ] |
| Comments |
| Comment by David Storch [ 19/Feb/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi rgpublic, I believe the problem you ran into was fixed as a side-effect of commit d7d110bfa62c51cf0d50ca248cd5dbe05ea34208. In this commit, we re-ordered candidate query plans so that bad index intersection plans will never be chosen over faster single-index plans. We are continuing to work on this area of the query optimizer in order to ensure that index intersection plans are only used when they offer a performance advantage. I'm resolving this ticket as Fixed per my comment above. Thanks for the detailed bug report! Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by rgpublic [ 10/Feb/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for investigating this. I cannot easily check the nightly at the moment, but I very much suspect mine to be the same bug. I've got (among other, unrelated ones) these 3 indices: folder, type, type_folder. In my case, it obviously isn't even necessary to do a regex query on both columns. One regular query and one regex query already trigger the bug. Glad to hear that it's fixed, because it made 2.5.5 completely unusable for us which was a bit disappointing because I was looking forward to the new intersection feature :-/ Finally, I wonder which change exactly (JIRA ticket?) fixed this so we might possibly close this one as DUPLICATE. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Davide Italiano [ 10/Feb/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was able to reproduced the following problem with enron mail dataset. http://mongodb-enron-email.s3-website-us-east-1.amazonaws.com This used to happen in 2.5.5 if you have 3 indexes on the fields "a" and "b":
Steps to reproduce on 2.5.5 after you restored the enron_mail dataset:
then run the following query:
cursor.explain() on 2.5.5
cursor.explain on HEAD (githash 57c49a91767327fedd3b22a1363480f1a7d9ff20)
which is more or less what you would expect (55 ms vs 1032 ms). You can try grab the latest nightly build and see if the problem is fixed for you as well. |