[SERVER-522] Better detection of regular expression constant prefix Created: 06/Jan/10  Updated: 12/Jul/16  Resolved: 15/Jan/10

Status: Closed
Project: Core Server
Component/s: JavaScript, Querying
Affects Version/s: None
Fix Version/s: 1.3.1

Type: Improvement Priority: Major - P3
Reporter: Mathieu Poumeyrol Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Documentation states: "For regular expressions of the form /^''normalchars''.*/, the database will use an index when available and appropriate."

However, the definition of "normal chars" as space, ascii letters and digits is hugely restrictive. For instance, our ids are URL paths. All of them of course starting with a /, making the indexes totally useless for our purposes.

Please have a look at:

http://github.com/kali/mongo/commit/d1664437ef4b1c0dab490522f55521bcb6b04cc1



 Comments   
Comment by Mathias Stearn [ 15/Jan/10 ]

That should do it. According to man pcrepattern an non-alphanumeric that is preceded by a slash represents itself.

Comment by auto [ 14/Jan/10 ]

Author:

{'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}

Message: Use indexes for regexes with escape chars SERVER-522
http://github.com/mongodb/mongo/commit/30403cdeb51b721ee0b68b9b61c69e62c2b00ae9

Comment by Mathias Stearn [ 13/Jan/10 ]

I had assumed that /\/asdf/ would become RegExp('/asdf'). Instead JS does the opposite for some reason and adds the \ even when not needed.

I also need to make simpleRegex support the {$regex: "", $options: ""} syntax.

Comment by Mathieu Poumeyrol [ 13/Jan/10 ]

Sorry, this is indeed better than it was, but I think it is still not good enough, as it does not take into account the escaped metachars. In the case of URL, one need the slash, and the slash has to be escaped, as it is usually used as the regexp delimiter.

My commit proposal in git make the code look a bit messy, but parsing regexp is messy. It manages the escaped characters, accepts UTF8 positions as well, and I provided some unit tests also. I am ready to rework this patch if there is something you don't like.

Comment by Mathias Stearn [ 12/Jan/10 ]

http://github.com/mongodb/mongo/commit/e73ba7834e523505856aac3a29305b2822b9b37a

Now will use all characters up-to the first PCRE "meta-character". This should work with unicode chars as well.

Also, setting the 'm' (multiline) flag won't prevent an index from being used. Still need to support 'x' (extended) but I doubt many people are using that in queries.

Comment by Eliot Horowitz (Inactive) [ 06/Jan/10 ]

Just need to tweak:

jsobj.cpp:string BSONElement::simpleRegex() const {

make sure we have positive and negative tests

Generated at Thu Feb 08 02:54:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.