[SERVER-26991] Inconsistent handling of RegEx options Created: 10/Nov/16  Updated: 08/Oct/21  Resolved: 07/Jul/21

Status: Closed
Project: Core Server
Component/s: JavaScript, Querying, Shell, Tools
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Hannes Magnusson Assignee: Mickey Winters
Resolution: Done Votes: 0
Labels: sbe-diff, sbe-post-v1, sbe-rollout, sp-shell
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by PYTHON-2721 Test failure - invalid flag in regex ... Closed
Documented
is documented by DOCS-14630 Investigate changes in SERVER-26991: ... Closed
Duplicate
duplicates SERVER-57079 Regex with "u" option fails on 5.0.0-... Closed
Related
related to SERVER-26992 Sorting on regex field should not dep... Backlog
is related to DOCS-14628 [Server] extended json supported rege... Backlog
is related to TOOLS-2916 extended json supported regex flags r... Accepted
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v4.4
Sprint: Query Execution 2021-06-28, Query Execution 2021-07-12
Participants:

 Description   

The server documents certain regex modifiers it supports: imxs
https://docs.mongodb.com/manual/reference/operator/query/regex/#op._S_options

It is actually inconsistent within itself of which options it supports:
gim: https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/bson/bsonelement.cpp#L245-L264
imxs: https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/matcher/expression_leaf.cpp#L232-L244

These modifiers are inconsistent with BSON Regex modifiers: imxlsu
http://bsonspec.org/spec.html

The shell however only allows and supports: gimy
https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/third_party/mozjs-45/extract/js/src/vm/RegExpObject.cpp#L933-L955

The server tools allow and support yet another set: gims
https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/gotools/common/bsonutil/bsonutil.go#L282-L287



 Comments   
Comment by Githook User [ 02/Jul/21 ]

Author:

{'name': 'Mickey. J Winters', 'email': 'mickey.winters@mongodb.com', 'username': 'mjrb'}

Message: SERVER-26991: Inconsistent handling of RegEx options
Branch: master
https://github.com/mongodb/mongo/commit/37b5ee33c469c2341e4d6988cc91a23aca440291

Comment by Bernie Hackett [ 07/Jun/21 ]

There's a possibly Python specific wrinkle here. In Python 3 all text is unicode. and regular expressions have the 'u' flag set by default.

>>> r = re.compile("")
>>> r.flags
32
>>> r.flags & re.UNICODE
re.UNICODE

Comment by Ian Boros [ 08/May/18 ]

Whenever we get around to this we should probably also fix the fact that the shell doesn't seem to validate the regex options it's given correctly:

db.t.insert({a: "hi"});
const cur = db.t.find({a:{$regex: "hi", $options: "Q__"}}); // "Q__" or anything else works
printjson(cur.toArray()); 

 

Comment by Bernie Hackett [ 07/Nov/17 ]

I agree, but that would be a backward breaking change for users that store regular expressions rather than just use them in queries. It will also just break a lot of existing applications that are (accidentally?) successfully using language native regular expressions . I'd prefer to wait until this server ticket is resolved and the BSON spec updated. Then we can decide how and when to make breaking changes to drivers.

Comment by Roy Williams [ 07/Nov/17 ]

@behackett Agreed,

This could be another ticket, but it seems like the mongo-python-driver should be filtering out invalid flags or throw instead of passing them along to the server only to have them rejected.

https://github.com/mongodb/mongo-python-driver/blob/9051b65510f9aafa7509c4557ff9581b0ffd4474/bson/regex.py#L62-L67

Comment by Bernie Hackett [ 07/Nov/17 ]

rwilliams-lyft, I suggest not using Python regular expressions with MongoDB. MongoDB uses PCRE for regular expression support. Though Python regular expressions are similar, they are not the same. Sometimes the difference will cause the server to return an error. Sometimes the difference will cause you to get different results than you expected. You are better off using the $regex query operator instead:

https://docs.mongodb.com/manual/reference/operator/query/regex/

Comment by Roy Williams [ 07/Nov/17 ]

FWIW we just got bit by this during our upgrade to Python 3, Python3 regexes always have the `u` flag set, which is getting passed through https://github.com/mongodb/mongo-python-driver unmodified. This then lead to us doing a full table scan instead of using an index since the index.

For example (using mongoengine):

```
myclass.objects(my_field__startswith="myname")
```

will perform the mongo query

```
db.myclass.find(

{"my_field":/^myname/u}

);
```

Comment by Bernie Hackett [ 14/Nov/16 ]

According to the design document, $regex and collations don't mix.

The $regex predicate will not respect a collation. For this reason, only indices with the SBC collation can be used to answer a regular expression.

That would appear to limit the usefulness of 'u' server side. 'l' seems like a weird option server side.

Comment by David Golden [ 11/Nov/16 ]

I suggest someone familiar with collation look into whether 'u' is significant/useful (and whether it's supported by the underlying regex engine). For example, in Perl, you need "u" with "i" for Unicode-aware case-insensitive matching. Also, "u" changes the meaning of "\w" and "\d" to match all Unicode numbers. (Perl also offers an "a" modifier to turn that off so "\d" and "\w" only match ASCII.)

If we're going to change the spec on flags, we should do so in a collation-aware way now that we support it.

Comment by Bernie Hackett [ 11/Nov/16 ]

The server isn't consistent internally. Here it expects imsx:

https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/db/matcher/expression_leaf.cpp#L232-L247

And here it expects gim:

https://github.com/mongodb/mongo/blob/r3.4.0-rc3/src/mongo/bson/bsonelement.cpp#L245-L263

That last part is really confusing considering SERVER-2943.

Generated at Thu Feb 08 04:13:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.