[SERVER-38621] Regex $options is ignored when it appears before a $regex BSON regular expression Created: 13/Dec/18  Updated: 29/Oct/23  Resolved: 12/Feb/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Evan Nixon
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes PYTHON-1681 Query returns different results depen... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2019-01-14, Query 2019-02-11, Query 2019-02-25
Participants:

 Description   

When both $options and $regex are BSON strings the order they appear in the query does not seem to matter. However $options appears to be ignored when it appears first and $regex is a BSON regular expression.

import pymongo
import re
from bson import SON
 
client = pymongo.MongoClient()
coll = client.test.test
 
coll.drop()
coll.insert_one({'array': [re.compile(b'62', 0), 'no options']})
coll.insert_one({'array': [re.compile(b'62', re.IGNORECASE), 'IGNORECASE']})
 
for q in [
        {'array': SON([('$options', 'i'), ('$regex', '62')])},
        {'array': SON([('$regex', '62'), ('$options', 'i')])},
        {'array': SON([('$regex', re.compile(b'62', re.IGNORECASE))])},
        {'array': SON([('$regex', re.compile(b'62')), ('$options', 'i')])},
        {'array': SON([('$options', 'i'), ('$regex', re.compile(b'62'))])}]:
    res = list(coll.find(q, projection={'_id': False}))
    print('>>> list(coll.find(%r)):\n%r' % (q, res))

Expected output:

>>> list(coll.find({'array': SON([('$options', 'i'), ('$regex', '62')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', '62'), ('$options', 'i')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', re.compile(b'62', re.IGNORECASE))])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', re.compile(b'62')), ('$options', 'i')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$options', 'i'), ('$regex', re.compile(b'62'))])})):
[{'array': [Regex('62', 0), 'IGNORECASE']}]

Actual output (notice the difference in the final query):

>>> list(coll.find({'array': SON([('$options', 'i'), ('$regex', '62')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', '62'), ('$options', 'i')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', re.compile(b'62', re.IGNORECASE))])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$regex', re.compile(b'62')), ('$options', 'i')])})):
[{'array': [Regex('62', 2), 'IGNORECASE']}]
>>> list(coll.find({'array': SON([('$options', 'i'), ('$regex', re.compile(b'62'))])})):
[{'array': [Regex('62', 0), 'no options']}]



 Comments   
Comment by Githook User [ 12/Feb/19 ]

Author:

{'name': 'Evan Nixon', 'email': 'evan.nixon@10gen.com', 'username': 'navenoxin'}

Message: SERVER-38621 Do not ignore regex options when specified first
Branch: master
https://github.com/mongodb/mongo/commit/613454eb99abe25682f9a50d93ee04e7e90ba314

Comment by David Storch [ 14/Jan/19 ]

The flawed parsing logic is here:

https://github.com/mongodb/mongo/blob/cf6e22331a81dac4e3c3800c9b94c0df1b439737/src/mongo/db/matcher/expression_parser.cpp#L536-L569

The parsing logic handles $regex and $options independently, so the options associated with whichever keyword comes second overwrites the options associated with the first. Instead, of just taking the second set of options, I propose that we should throw an error if options are specified in both places and then use whichever options set is non-empty. (To avoid a backwards compatibility break we can neither prevent $options from being specified before $regex, or require the BSON type of $regex to be a string.)

I'm moving this back to "Needs Scheduling" state so that the query team can re-triage it.

Comment by Asya Kamsky [ 17/Dec/18 ]

Could this be the same underlying reason as mentioned here?

Generated at Thu Feb 08 04:49:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.