[SERVER-57079] Regex with "u" option fails on 5.0.0-alpha0 Created: 19/May/21  Updated: 01/Jul/21  Resolved: 01/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Prashant Mital (Inactive) Assignee: Mickey Winters
Resolution: Duplicate Votes: 0
Labels: sbe-post-v1, sbe-rollout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-26991 Inconsistent handling of RegEx options Closed
Operating System: ALL
Backport Requested:
v5.0
Steps To Reproduce:
  • Connect to a RS running version of MongoDB mentioned above using PyMongo 3.11.4

            from pymongo import MongoClient
            client = MongoClient(directConnection=False)
    

  • Insert the following document:

            client.db.test.insert_one({"x": "hello_test"})
    

  • Run the following find operation:

            import re
            client.db.test.find({"x": re.compile("^hello.*")}))), 4)
    

Sprint: Query Execution 2021-06-14, Query Execution 2021-06-28, Query Execution 2021-07-12
Participants:

 Description   

This regression will break all users of PyMongo who are using Python 3 as Python 3 uses the u option by default.

This regression was discovered when a PyMongo test started failing with the error:

pymongo.errors.OperationFailure:  invalid flag in regex options: u, full error: {'ok': 0.0, 'errmsg': ' invalid flag in regex options: u', 'code': 51108, 'codeName': 'Location51108', '$clusterTime': {'clusterTime': Timestamp(1621462192, 5), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'operationTime': Timestamp(1621462192, 5)}

Server version:

db version v5.0.0-alpha0-475-g96d00d9
Build Info: {
    "version": "5.0.0-alpha0-475-g96d00d9",
    "gitVersion": "96d00d92d71ed9ddc0ac4eb3f60b0b27cb9dcb34",
    "modules": [
        "enterprise"
    ],
    "allocator": "system",
    "environment": {
        "distarch": "x86_64",
        "target_arch": "x86_64"
    }
}

While the $regex operator documentation does not list u as a supported option, this was working until now.



 Comments   
Comment by Mickey Winters [ 23/Jun/21 ]

my plan right now is for this ticket to get fixed by fixing 26991

Comment by Kyle Suarez [ 23/Jun/21 ]

Hi prashant.mital, the behavior here is a regression in the SBE engine, which has since been turned off by default in SERVER-57758. We are still working on fixing this for SBE, but since this is no longer a bug in the 5.0 release, I am changing the fix version from "5.0 Required" to "Backlog" as it is not a release blocker. CC server-release

Comment by Mickey Winters [ 07/Jun/21 ]

Inconsistency between SBE and Classic Engines is going to be resolved by SERVER-57079. so that both ignore unsupported options, however this will still be inconsistent with aggregation ignoring unsupported options.

Comment by Anton Korshunov [ 28/May/21 ]

This is a regression in the SBE engine:

db.coll.find({a: /a/u})
Error: error: {
        "ok" : 0,
        "errmsg" : " invalid flag in regex options: u",
        "code" : 51108,
        "codeName" : "Location51108"
}
db.adminCommand({setParameter: 1, internalQueryForceClassicEngine: true})
{ "was" : false, "ok" : 1 }
db.coll.find({a: /a/u})

The error is coming from flagsToPcreOptions. It takes an argument to indicate whether unknown options should be ignored or not.

In the classic engine this argument is set to true when we construct a RegexMatchExpression and to false when we construct an aggregate ExpressionRegex.

However, in SBE we use the same sbe::PcreRegex value and built-in VM functions both for match and aggregate regex expressions where we set the flag to false.

Sending this ticket to QE for re-triaging.

Comment by Bernie Hackett [ 27/May/21 ]

Note that Python regular expressions support all the options listed in the BSON spec, so the server should continue to ignore anything the BSON spec documents but the server does not itself support rather than returning an error.

Comment by Bernie Hackett [ 27/May/21 ]

Or if not strictly supported (I have no idea if PCRE even supports a unicode option) at least continue to be silently ignored.

Comment by Prashant Mital (Inactive) [ 20/May/21 ]

CC: behackett

Comment by Prashant Mital (Inactive) [ 20/May/21 ]

As per the BSON spec:

Regular expression - The first cstring is the regex pattern, the second is the regex options string. Options are identified by characters, which must be stored in alphabetical order. Valid options are 'i' for case insensitive matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w, \W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and 'u' to make \w, \W, etc. match unicode.

So the u option should definitely be supported by the server.

Generated at Thu Feb 08 05:40:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.