[SERVER-22722] Ranged regex uses inefficient indexBounds Created: 18/Feb/16  Updated: 22/Feb/16  Resolved: 22/Feb/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mircea Gaceanu Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-9938 Inefficient index boundary selected w... Backlog
Operating System: ALL
Participants:

 Description   

By querying using a ranged regex an additional indexBound is used with the part of the regex until the range begins.
For example:

db.coll.find({ _id: /^1541\/1\/F\/8\/1\/2014\/0[8-9]/ }).explain("executionStats")

the indexBounds are:

{
    "_id" : [ 
        "[\"1541/1/F/8/1/2014/0\", \"1541/1/F/8/1/2014/1\")", 
        "[/^1541\\/1\\/F\\/8\\/1\\/2014\\/0[8-9]/, /^1541\\/1\\/F\\/8\\/1\\/2014\\/0[8-9]/]"
    ]
}

This way it scans a lot more from the index. In this case the _id represents a date prefixed with other info. So the "additional" indexBound will scan all 01 to 09 months.



 Comments   
Comment by Kelsey Schubert [ 22/Feb/16 ]

Hi mirceag,

Thank you for the report. The index bounds builder does not create index bounds for PCRE meta-characters, which includes a character class definition like [8-9]. Consequently, we only accelerate the regular expression matching with the index up until the 0 character as the explain output shows. If you are curious, you can find the code that governs this behavior here

SERVER-9938 describes an improvement to the index boundary for character classes. Please feel free to vote for it and watch it for updates.

Kind regards,
Thomas

Generated at Thu Feb 08 04:01:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.