[DRIVERS-82] Don't compile BSON regexes to native regexes Created: 03/Apr/13  Updated: 04/Mar/21  Resolved: 22/Jun/16

Status: Closed
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: A. Jesse Jiryu Davis Assignee: Barrie Segal
Resolution: Done Votes: 0
Labels: 3.0, regex
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on PYTHON-500 PyMongo can error when retrieving reg... Closed
depends on NODE-38 Don't compile BSON regexes to native ... Closed
depends on RUBY-698 Add optional support to prevent compi... Closed
depends on JAVA-801 Create option to not compile BSON reg... Closed
Related
related to MONGOSH-623 Retrieving stored PCRE-style regex ca... Closed
related to SERVER-11771 extended options for $regex cannot be... Closed
related to SERVER-54588 Query on system.profile() failing whe... Closed
related to CDRIVER-1883 libbson should ensure regex options a... Closed
is related to DRIVERS-331 BSON Regex flags must be alphabetical... Closed
Driver Compliance:
Key Status/Resolution FixVersion
PYTHON-500 Done 2.7
PERL-221 Done
JAVA-801 Done 3.0.0
NODE-38 Done 2.0.40
RUBY-698 Done 1.10.0, 2.0.0

 Description   

Drivers can retrieve Regex-type BSON values from MongoDB under several circumstances:

  • When a regex has been stored in a document, then queried
  • When a regex query is in progress on another connection and the driver queries $cmd.sys.inprog
  • When a regex query is stored in system.profile and the driver queries the profile

These regexes might be intended by their authors to be PCRE because they're intended to run on server. However, they need not always be. We can't make any predictions about the content of a BSON regex or what its author intended it to match.

Unfortunately, most of the drivers compile BSON regexes into their native regex format, and all languages have different regex flavors. If a regex can't be compiled in the local flavor, then the whole document is unparsable and there's no workaround. Even if the regex is parsable in the local flavor, we the driver authors don't know if it will behave as intended, since we don't know if our local flavor matches the flavor the regex's author intended it to run on. Also, if the local regex flavor doesn't support the same flags as BSON--ilmsux--then it may be unable to round-trip regexes from server to client and back again while preserving all the flags. Finally, we doubt that most regexes retrieved from the server are ever executed client-side, so greedily compiling all regexes is wasteful.

We must change the behavior in two steps.

1. If your driver always compiles retrieved regexes, add some feature to optionally disable compilation. If compilation is disabled, represent the BSON regexes some other way, e.g. a MongoRegex class that contains the uncompiled regex pattern as a string, and its flags. (The name of the class is up to you.)

A MongoRegex is encoded into a BSON regex, so it's a means of sending a PCRE to the server even if its pattern can't be compiled in the local flavor, and / or its flags aren't supported by the local flavor.

The MongoRegex class should have a try_compile method to convert to a native regex, with a warning in the documentation like this:

Warning: regular expressions retrieved from the server may include a pattern that cannot be compiled into a <LANGUAGE> regular expression, or which matches a different set of strings in <LANGUAGE> than it does when used in a MongoDB query, or it may have flags that are not supported by <LANGUAGE> regular expressions. try_compile() may raise a <WHATEVER> exception.

Add a method like MongoRegex.from_native to attempt to convert from a native regex to a MongoRegex. It should be documented like:

Warning: <LANGUAGE> regular expressions use a different syntax and different set of flags than BSON regular expressions. A regular expression matches different strings when executed in <LANGUAGE> than it matches when used in a MongoDB query, if it can be used in a query at all.

2. In the next major (API-breaking) release, disable auto-compilation entirely. There shall be no option to turn automatic compilation back on. Users must retrieve MongoRegex instances and call try_compile to get native regexes.

Native regular expressions will still be accepted by find(), insert(), update(), remove(), runCommand(), etc., but this is discouraged. Users should construct a MongoRegex from a string and flags.



 Comments   
Comment by Andrew Morrow (Inactive) [ 03/Mar/15 ]

C++11 does not have any special handling for regex.

Generated at Thu Feb 08 08:20:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.