Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- neweng

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When a regular expression predicate is of the form {f1 : "^abc.*"}, that is the regular expression itself is a prefix search, and the regular expression is a prefix of an index, it is possible to rewrite the regular expression into an equivalent inequality of the form:

{$gte: "abc", $lt: "abd"}

where the argument of $lt is the same string, but the last character is incremented with one (that is, it is the next character). This can be done safely if there are no collations.

There already exists logic that generates the equivalent inequality condition, however the optimizer still produces an IXSCAN instead of DISTINCT_SCAN. The reason for the less efficient plan is that IndexBoundsBuilder::simpleRegex (called from IndexBoundsBuilder::translateRegex) doesn't detect that this is a prefix search, and returns IndexBoundsBuilder::INEXACT_COVERED instead of IndexBoundsBuilder::EXACT.

A GDB experiment, that sets the out parameter 'tightnessOut' of simpleRegex, results in the desired DISTINCT_SCAN plan. Consider the example below:

db.coll1.insertMany([{f1:'abc'}, {f1:'abd'}]);
db.coll1.createIndex({a:1})
b.coll1.explain().distinct("a", {a: {"$regex": "^ab.*"}});

Now break at the end of IndexBoundsBuilder::simpleRegex, and

set *tightnessOut=IndexBoundsBuilder::EXACT

This results in the following plan:

    winningPlan: {
      stage: 'PROJECTION_COVERED',
      transformBy: {},
      inputStage: {
        stage: 'DISTINCT_SCAN',
        keyPattern: { a: 1 },
        indexName: 'a_1',
        isMultiKey: false,
        multiKeyPaths: { a: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { a: [ '["ab", "ac")', '[/^ab/, /^ab/]' ] }
      }

Based on this experiment, it seems that in order to improve the plan it is sufficient to extend IndexBoundsBuilder::simpleRegex to recognize simple prefix regular expressions.

The implementation of this task should make sure that it understands the exact conditions under which a prefix regular expression will result in equivalent upper/lower index bounds, and return IndexBoundsBuilder::EXACT only in those circumstances.

Assignee:: Ribhav Jain (Inactive)
Reporter:: Timour Katchaounov
Participants:: Githook User, Ribhav Jain, Timour Katchaounov
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Nov 26 2021 02:44:54 PM UTC
Updated:: Feb 16 2022 04:04:23 PM UTC
Resolved:: Feb 16 2022 04:04:23 PM UTC
Confidence Status Last Update:: 16/Dec/21 8:09 PM

Details

Description

Attachments

Activity

People

Dates