Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: None
Component/s: Querying
Labels:
None

Assigned Teams:

Query Optimization
Backwards Compatibility:
Fully Compatible
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Case insensitive left-anchored regular expressions whose first few characters are non-special and ASCII don't need to do full index scans, but can be exploded out to 2^n range queries with regex filters where n is the number of non-special characters at the beginning of the regex.

For example, this query:

db.foo.find( { s : /^ab/i } );

could be converted in mongod to anything functionally equivalent to

db.foo.find({"$or": [ { s : /^ab/ }, { s : /^Ab/ }, { s : /^aB/ }, { s : /^AB/ } ] );

(However, I do not specifically request that the mongod actually run that $or query, just anything analogous to it.)

Obviously the exponential explosion in disjuncts would be undesirable for sufficiently long prefixes, but perhaps an expansion of the first few characters' worth could be a win for people compared the full index scan we currently do.

(In this issue I restrict the request to just left anchored regular expressions corresponding to a prefix of ASCII characters because restricting the problem just to ASCII strings reduces the complexity of implementation and perhaps maintenance. This is not intended to be a substitute for introduction of any locale aware, case insensitive collations, just an optimization request for a feature that's unlikely to go away even if we do grow locale awareness.)

Testing: presumably this improvement would somehow be reflected in the explain() plan. So certain aspects of testing could include comparing explain plains from case-sensitive and case-insensitive regular expression queries.

Documentation changes needed: if we document the absence of optimizations for case-insensitive regular expressions, we should cease to do so if this improvement is made. (Whether we also document the existence of the new optimization is a separate question about which I have no opinion.)

is duplicated by

SERVER-49002 Utilize index for anchored case-insensitive $regex

Closed

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: Richard Kreuter (Inactive)
Participants:: [DO NOT USE] Backlog - Query Optimization, Richard Kreuter
Votes:: 1 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Jun 06 2014 10:13:25 PM UTC
Updated:: Nov 28 2023 07:30:57 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates