[CDRIVER-1883] libbson should ensure regex options are sorted Created: 25/Oct/16  Updated: 17/Apr/17  Resolved: 28/Feb/17

Status: Closed
Project: C Driver
Component/s: libbson
Affects Version/s: None
Fix Version/s: 1.7.0

Type: Improvement Priority: Minor - P4
Reporter: David Golden Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by CXX-1118 BSON Regex flags must be alphabetical... Closed
is depended on by DRIVERS-331 BSON Regex flags must be alphabetical... Closed
Related
related to PHPC-885 Alphabetize Regex flags when instanti... Closed
related to DOCS-9208 Document requirement that regular exp... Closed
is related to DRIVERS-82 Don't compile BSON regexes to native ... Closed
is related to CDRIVER-1889 Implement BSON Corpus tests runner Closed
Epic Link: libbson corpus

 Description   

libbson doesn't appear to validate the options string. It just passes through the exact cstring without checking that only valid letters are included (i.e. "ismxlu").



 Comments   
Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'jmikola', u'name': u'Jeremy Mikola', u'email': u'jmikola@gmail.com'}

Message: Update XFAILS for BSON corpus tests

CDRIVER-1883 was resolved, so regex-valid-005.phpt no longer fails.

CDRIVER-1966 and CDRIVER-1967 were addressed by a new bson_as_extended_json() function for CDRIVER-1947. These tests still fail, but can now depend on PHPC-941, which will update the driver to use that new function.
Branch: master
https://github.com/mongodb/mongo-php-driver/commit/59f3d9d3c415c36a275ddcee0321290d1ffc3656

Comment by A. Jesse Jiryu Davis [ 28/Feb/17 ]

https://github.com/mongodb/libbson/commit/b6144f580251b74c0ab1bf71c8e5ce22c44ae250

Comment by Githook User [ 22/Nov/16 ]

Author:

{u'username': u'jmikola', u'name': u'Jeremy Mikola', u'email': u'jmikola@gmail.com'}

Message: PHPC-829: BSON Regex flags must be alphabetically ordered

We do not test that MongoDB\BSON\fromJSON() alphabetizes the flags, as that will be handled by CDRIVER-1883.
Branch: master
https://github.com/mongodb/mongo-php-driver/commit/dd1a758f00f22f8d5ba269650b4c25a177510b84

Comment by David Golden [ 27/Oct/16 ]

I have amended the BSON corpus already and filed DRIVERS-331.

What I meant by filing a SPEC ticket would be against BSON to remove the alpha sort. I have no interest in that and from what behackett says, there are server reasons why that would likely be rejected anyway.

Comment by Hannes Magnusson [ 27/Oct/16 ]

Did you ever open that SPEC ticket david.golden? Could you link it when you have?

Comment by David Golden [ 25/Oct/16 ]

Same issue sorting on subdocuments. Annoying as heck in Perl where hash key order is guaranteed random (PERL-346).

Comment by Bernie Hackett [ 25/Oct/16 ]

It turns out the order of the options is considered when sorting stored regular expressions. Why anyone would need to sort on stored regular expressions (or store regular expressions in the first place) is beyond me, but that would appear to be the reason for the requirement.

Comment by Bernie Hackett [ 25/Oct/16 ]

Also, we don't document that order matters in the server docs:

https://docs.mongodb.com/manual/reference/operator/query/regex/

Comment by Bernie Hackett [ 25/Oct/16 ]

which must be stored in alphabetical order

Huh, I've never noticed that and know not all drivers do it. It doesn't seem to matter on the server at all. Also, dotall ("s") was added in, if I remember correctly, MongoDB 2.0. So there is precedent for adding support for new flags.

Comment by David Golden [ 25/Oct/16 ]

A softer version of this would be the "accept lax; emit strict" and that's probably better.

Comment by David Golden [ 25/Oct/16 ]

Sounds like we need a SPEC ticket for the BSON spec itself to amend it. I don't know who on the server team "owns" BSON, but maybe we can use this issue to flush that out.

I don't have strong feelings about most of these (e.g. booleans must be 0 or 1 versus 0 or non-zero); I'm just documenting my discoveries as behackett asked me to do.

Comment by Hannes Magnusson [ 25/Oct/16 ]

See linked ticket. Lots of drivers used to even compile this down to their native regex format. Noone ever mentioned that this should be validated or can only contain 6 predefined characters.

Adding this validation I am very worried we will break current use of this type. Be it abuse or not.

Comment by David Golden [ 25/Oct/16 ]

Note, also the "alphabetical order" part. I don't believe libbson checks that either.

Comment by David Golden [ 25/Oct/16 ]

The spec says:

Regular expression - The first cstring is the regex pattern, the second is the regex options string. Options are identified by characters, which must be stored in alphabetical order. Valid options are 'i' for case insensitive matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w, \W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and 'u' to make \w, \W, etc. match unicode.

That's pretty clear what "valid" means.

I have no objection if someone wants to change the spec. If particular projects choose to ignore the spec, that's the responsibility of the project owners.

But, under the principle that the spec is the law, I've added a test case for this to the BSON corpus so we at least learn which drivers comply and which don't.

Comment by Hannes Magnusson [ 25/Oct/16 ]

I've never heard that this should be validated. The server doesn't validate it either, and allowing only a restricted set will prohibit the server from expanding its capabilities in the future - and worse, when it does, we'll never actually realize it and continue to block the new options.

Generated at Wed Feb 07 21:13:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.