[SERVER-20432] $regex prefix search with escaped "|" should use tighter index bounds Created: 16/Sep/15 Updated: 16/May/18 Resolved: 26/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.0.5, 3.0.6 |
| Fix Version/s: | 3.6.0-rc2 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | ma6174 | Assignee: | Kyle Suarez |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Initialization
search and view mongodb log(run with verbose)1. search with |
mongodb v3.0.6 log show that "nscanned:100"
we also tested in v2.4.7, same query showed "nscanned:0"
2. search without |
mongodb v3.0.6 show that "nscanned:0", this is expected
mongodb explain info
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2017-10-23, Query 2017-11-13 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
we found an slow log, it takes 271 seconds, after check, we found that the "nscanned:107090777" is vary large. we expect that the query is an prefix search, should use index, and match only several documents.
|
| Comments |
| Comment by Kyle Suarez [ 26/Oct/17 ] | |
|
Hey all, With the commit above, regular expressions containing certain sequences of escaped | characters will now be treated as non-special and are eligible to use tight index bounds. This doesn't include every conceivable way you can have a literal pipe character in a regular expression; for example, a pipe in a character class or escaped with the \Q...\E escape sequence may still be mistakenly treated as a special regex. A full fix that works in all circumstances would depend on SERVER-16622. Hopefully, this fix improves performance for some common use cases. You can expect it to be included in the coming MongoDB 3.6 release. Best, | |
| Comment by Githook User [ 26/Oct/17 ] | |
|
Author: {'email': 'kyle.suarez@mongodb.com', 'name': 'Kyle Suarez', 'username': 'ksuarz'}Message: | |
| Comment by Oleg Rekutin [ 29/Aug/17 ] | |
|
Thank you Ian! | |
| Comment by Ian Whalen (Inactive) [ 25/Aug/17 ] | |
|
Hey Oleg thanks for following up for your thoughts. Dave and the query team have talked and agreed. We'll see if we can get this in for the upcoming 3.6 release. | |
| Comment by Oleg Rekutin [ 28/Apr/17 ] | |
|
More detail. Specifically this line is wrong: https://github.com/mongodb/mongo/blob/r3.2.12/src/mongo/db/query/index_bounds_builder.cpp#L71
Cleary, that code ignores escaped | characters! These two regular expressions are different: Expression A is NOT a prefix query. Expression B is 100% a prefix query, with nothing optional in it. However, due to the code referenced above and this bug, it still fails the "simpleRegex" check. | |
| Comment by Oleg Rekutin [ 28/Apr/17 ] | |
|
@david.storch I think you are missing the point... the point is that an escaped pipe character should NOT at all be treated in any special way. I don't think it is at all valid to close this bug as a duplicate SERVER-16622. This bug should be reopened. SERVER-16622 describes smart handling of an actual | character, which IS NOT the case here. Merely using a pipe character in the text makes it impossible to do prefixed index scans. | |
| Comment by David Storch [ 05/Oct/15 ] | |
|
Hi ma6174, After reviewing this ticket, the engineering team responsible for the "Querying" component has decided to consider this a duplicate of SERVER-16622. Fixing the backslash-escaped "|" character case makes sense to do as part of the larger ticket, as this will require parsing the regular expression and analyzing the parse tree. From an engineering perspective, we would much rather use proper regex parsing than introduce a hack that special cases the string "|". There is a lot more context on SERVER-16622, as my colleague Stephen pointed out in an earlier response. Please watch SERVER-16622 for progress updates. Best, | |
| Comment by ma6174 [ 16/Sep/15 ] | |
|
Thanks. | |
| Comment by Stennie Steneker (Inactive) [ 16/Sep/15 ] | |
|
Hi ma6174, We appreciate all bug reports and product suggestions. This issue will be triaged by the engineering team and considered in the next round of planning. You can upvote and watch this issue for updates. Thanks, | |
| Comment by ma6174 [ 16/Sep/15 ] | |
|
Hi, Since we had escaped the character "|" to "\ |" , this should be regarded as a normal character, not regular expression "OR", this should use index to search. If this not be fixed, and some user use this to search, if the collection very large, the whole datebase maybe become very slow or unavailable. Will you consider to fix this? Regards, | |
| Comment by Stennie Steneker (Inactive) [ 16/Sep/15 ] | |
|
Hi, Thanks for reporting this issue and including steps to reproduce. It looks like this is a consequence of the changes for In your particular case the "|" is escaped but unfortunately this is still matching as a "non-simple" regex, which results in a full index scan rather than bounds limited to the regex prefix. Regards, |