[SERVER-61750] Use DISTINCT_SCAN when a regex is a prefix of an index Created: 26/Nov/21  Updated: 16/Feb/22  Resolved: 16/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Timour Katchaounov Assignee: Ribhav Jain (Inactive)
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Participants:

 Description   

When a regular expression predicate is of the form {f1 : "^abc.*"}, that is the regular expression itself is a prefix search, and the regular expression is a prefix of an index, it is possible to rewrite the regular expression into an equivalent inequality of the form:

{$gte: "abc", $lt: "abd"}

where the argument of $lt is the same string, but the last character is incremented with one (that is, it is the next character). This can be done safely if there are no collations.

There already exists logic that generates the equivalent inequality condition, however the optimizer still produces an IXSCAN instead of DISTINCT_SCAN. The reason for the less efficient plan is that IndexBoundsBuilder::simpleRegex (called from IndexBoundsBuilder::translateRegex) doesn't detect that this is a prefix search, and returns IndexBoundsBuilder::INEXACT_COVERED instead of IndexBoundsBuilder::EXACT.

A GDB experiment, that sets the out parameter 'tightnessOut' of simpleRegex, results in the desired DISTINCT_SCAN plan. Consider the example below:

db.coll1.insertMany([{f1:'abc'}, {f1:'abd'}]);
db.coll1.createIndex({a:1})
b.coll1.explain().distinct("a", {a: {"$regex": "^ab.*"}});

Now break at the end of IndexBoundsBuilder::simpleRegex, and

set *tightnessOut=IndexBoundsBuilder::EXACT

This results in the following plan:

    winningPlan: {
      stage: 'PROJECTION_COVERED',
      transformBy: {},
      inputStage: {
        stage: 'DISTINCT_SCAN',
        keyPattern: { a: 1 },
        indexName: 'a_1',
        isMultiKey: false,
        multiKeyPaths: { a: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { a: [ '["ab", "ac")', '[/^ab/, /^ab/]' ] }
      }

 

Based on this experiment, it seems that in order to improve the plan it is sufficient to extend IndexBoundsBuilder::simpleRegex to recognize simple prefix regular expressions.

The implementation of this task should make sure that it understands the exact conditions under which a prefix regular expression will result in equivalent upper/lower index bounds, and return IndexBoundsBuilder::EXACT only in those circumstances.



 Comments   
Comment by Githook User [ 10/Jan/22 ]

Author:

{'name': 'Ribhav Jain', 'email': 'ribhav.jain@mongodb.com', 'username': 'ribhavjain'}

Message: SERVER-61750: Use DISTINCT_SCAN when a regex is a prefix of an index
Branch: master
https://github.com/mongodb/mongo/commit/f97ee7d37f1a0e18f248d0726823445019409b50

Comment by Timour Katchaounov [ 29/Nov/21 ]

Yes, this is a quick win candidate.

Generated at Thu Feb 08 05:53:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.