[SERVER-15235] Regex query returns incorrect results when index is present Created: 12/Sep/14  Updated: 11/Jul/16  Resolved: 12/Nov/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.11, 2.6.3, 2.7.6
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Major - P3
Reporter: Victor Hooi Assignee: David Storch
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File regex_different_results_with_index.js    
Issue Links:
Duplicate
is duplicated by SERVER-13199 regex uses too strict index bounds wh... Closed
Related
related to SERVER-20432 $regex prefix search with escaped "|"... Closed
related to SERVER-22872 Order by is not working in 3.2.3 Closed
related to SERVER-22890 3.2.3 performance regression Closed
is related to SERVER-16622 RegEx query predicates using the | (v... Backlog
Operating System: ALL
Steps To Reproduce:

See regex_different_results_with_index.js

Participants:

 Description   

We've noticed a strange issue where a regex query on a field returns different results, depending on whether there is an index defined on that field or not.

I've included a JS test file (regex_different_results_with_index.js) which reproduces the issue.

The regex is left-anchored. Also, an equavilent regex expression (included in JS) works in both cases:

  • /^(a(a|$)|b)/ : works both with and without index
  • /^a(a|$)|^b/ : works without an index, does not work with an index

Looking at the regex expression, the result without the index (both "aa" and "bb" match) should be the correct one. When the index is present, only "aa" is returned, which is incorrect.

I have tested this on 2.4.11 and 2.6.3.



 Comments   
Comment by Ramon Fernandez Marina [ 03/Mar/16 ]

Thanks for opening a new ticket itwebtf@saxobank.com.

The 2.8 release was renamed 3.0, so anything with a 2.8.0-rc0 fixVersion is included in all 3.x versions.

Comment by ITWEBTF SAXOBANK [ 03/Mar/16 ]

Hi Ramon,

This ticket is market with 2.8.0-rc0, but I know it is in 3.x releases.Could you inform me about what exact 3.x release this bug fix went into?

I have created a new ticket SERVER-22939.

Regards,
Brian

Comment by Ramon Fernandez Marina [ 03/Mar/16 ]

itwebtf@saxobank.com, this fix was published in a release, so in order to avoid confusion we need a new ticket if you think this behavior needs to change. If you feel none of the ones you've opened already are good candidates please feel free to open a new one.

Thanks,
Ramón.

Comment by ITWEBTF SAXOBANK [ 02/Mar/16 ]

This must be reopened.

The problem with this fix is that, even when I have a regular expression that has a left-anchor and an escaped '|' character, that query is recognized as being "non-simple".

This is wrong, and it causes unnecessarily that backwards compatibility is broken and performance is significantly degraded. See SERVER-22872 and SERVER-22890.

Comment by Githook User [ 12/Nov/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-15235 regular expressions with the '|' character must be classified as simple
Branch: master
https://github.com/mongodb/mongo/commit/866d3851fcb7a6b230cb2fde7b2a6253ba32eeb5

Comment by J Rassi [ 03/Nov/14 ]

No.

This issues exposes a bug in the code that parses regular expression query predicates to generate index bounds, for regular expressions that have multiple left-anchor subexpressions joined by an OR. In the example expression /^a(a|$)|^b/, the index bounds ["a", "b") are generated, which causes the query to miss all documents that start with 'b'. The problem is that the parser looks at the the first two characters ('^' and then 'a'), and then incorrectly stops at the third character (the left paren) and reports that the regular expression is a left-anchored expression with prefix "a".

Comment by Tom Robinson [ 03/Nov/14 ]

Could this issue be related: http://stackoverflow.com/questions/26631352/can-i-do-a-mongodb-starts-with-query-on-an-indexed-subdocument-field/26635039?noredirect=1#comment41905713_26635039

Generated at Thu Feb 08 03:37:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.