[SERVER-522] Better detection of regular expression constant prefix Created: 06/Jan/10 Updated: 12/Jul/16 Resolved: 15/Jan/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 1.3.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathieu Poumeyrol | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Documentation states: "For regular expressions of the form /^''normalchars''.*/, the database will use an index when available and appropriate." However, the definition of "normal chars" as space, ascii letters and digits is hugely restrictive. For instance, our ids are URL paths. All of them of course starting with a /, making the indexes totally useless for our purposes. Please have a look at: http://github.com/kali/mongo/commit/d1664437ef4b1c0dab490522f55521bcb6b04cc1 |
| Comments |
| Comment by Mathias Stearn [ 15/Jan/10 ] |
|
That should do it. According to man pcrepattern an non-alphanumeric that is preceded by a slash represents itself. |
| Comment by auto [ 14/Jan/10 ] |
|
Author: {'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}Message: Use indexes for regexes with escape chars |
| Comment by Mathias Stearn [ 13/Jan/10 ] |
|
I had assumed that /\/asdf/ would become RegExp('/asdf'). Instead JS does the opposite for some reason and adds the \ even when not needed. I also need to make simpleRegex support the {$regex: "", $options: ""} syntax. |
| Comment by Mathieu Poumeyrol [ 13/Jan/10 ] |
|
Sorry, this is indeed better than it was, but I think it is still not good enough, as it does not take into account the escaped metachars. In the case of URL, one need the slash, and the slash has to be escaped, as it is usually used as the regexp delimiter. My commit proposal in git make the code look a bit messy, but parsing regexp is messy. It manages the escaped characters, accepts UTF8 positions as well, and I provided some unit tests also. I am ready to rework this patch if there is something you don't like. |
| Comment by Mathias Stearn [ 12/Jan/10 ] |
|
http://github.com/mongodb/mongo/commit/e73ba7834e523505856aac3a29305b2822b9b37a Now will use all characters up-to the first PCRE "meta-character". This should work with unicode chars as well. Also, setting the 'm' (multiline) flag won't prevent an index from being used. Still need to support 'x' (extended) but I doubt many people are using that in queries. |
| Comment by Eliot Horowitz (Inactive) [ 06/Jan/10 ] |
|
Just need to tweak: jsobj.cpp:string BSONElement::simpleRegex() const { make sure we have positive and negative tests |