[SERVER-14197] Case insensitive left-anchored regular expressions whose first few characters are non-special and ASCII don't need to do full index scans Created: 06/Jun/14 Updated: 28/Nov/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Richard Kreuter (Inactive) | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Optimization
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Case insensitive left-anchored regular expressions whose first few characters are non-special and ASCII don't need to do full index scans, but can be exploded out to 2^n range queries with regex filters where n is the number of non-special characters at the beginning of the regex. For example, this query:
could be converted in mongod to anything functionally equivalent to
(However, I do not specifically request that the mongod actually run that $or query, just anything analogous to it.) Obviously the exponential explosion in disjuncts would be undesirable for sufficiently long prefixes, but perhaps an expansion of the first few characters' worth could be a win for people compared the full index scan we currently do. (In this issue I restrict the request to just left anchored regular expressions corresponding to a prefix of ASCII characters because restricting the problem just to ASCII strings reduces the complexity of implementation and perhaps maintenance. This is not intended to be a substitute for introduction of any locale aware, case insensitive collations, just an optimization request for a feature that's unlikely to go away even if we do grow locale awareness.) Testing: presumably this improvement would somehow be reflected in the explain() plan. So certain aspects of testing could include comparing explain plains from case-sensitive and case-insensitive regular expression queries. Documentation changes needed: if we document the absence of optimizations for case-insensitive regular expressions, we should cease to do so if this improvement is made. (Whether we also document the existence of the new optimization is a separate question about which I have no opinion.) |