Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61750

Use DISTINCT_SCAN when a regex is a prefix of an index

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      When a regular expression predicate is of the form {f1 : "^abc.*"}, that is the regular expression itself is a prefix search, and the regular expression is a prefix of an index, it is possible to rewrite the regular expression into an equivalent inequality of the form:

      {$gte: "abc", $lt: "abd"}

      where the argument of $lt is the same string, but the last character is incremented with one (that is, it is the next character). This can be done safely if there are no collations.

      There already exists logic that generates the equivalent inequality condition, however the optimizer still produces an IXSCAN instead of DISTINCT_SCAN. The reason for the less efficient plan is that IndexBoundsBuilder::simpleRegex (called from IndexBoundsBuilder::translateRegex) doesn't detect that this is a prefix search, and returns IndexBoundsBuilder::INEXACT_COVERED instead of IndexBoundsBuilder::EXACT.

      A GDB experiment, that sets the out parameter 'tightnessOut' of simpleRegex, results in the desired DISTINCT_SCAN plan. Consider the example below:

      db.coll1.insertMany([{f1:'abc'}, {f1:'abd'}]);
      db.coll1.createIndex({a:1})
      b.coll1.explain().distinct("a", {a: {"$regex": "^ab.*"}});
      

      Now break at the end of IndexBoundsBuilder::simpleRegex, and

      set *tightnessOut=IndexBoundsBuilder::EXACT

      This results in the following plan:

          winningPlan: {
            stage: 'PROJECTION_COVERED',
            transformBy: {},
            inputStage: {
              stage: 'DISTINCT_SCAN',
              keyPattern: { a: 1 },
              indexName: 'a_1',
              isMultiKey: false,
              multiKeyPaths: { a: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: { a: [ '["ab", "ac")', '[/^ab/, /^ab/]' ] }
            }
      

       

      Based on this experiment, it seems that in order to improve the plan it is sufficient to extend IndexBoundsBuilder::simpleRegex to recognize simple prefix regular expressions.

      The implementation of this task should make sure that it understands the exact conditions under which a prefix regular expression will result in equivalent upper/lower index bounds, and return IndexBoundsBuilder::EXACT only in those circumstances.

            Assignee:
            ribhav.jain@mongodb.com Ribhav Jain (Inactive)
            Reporter:
            timour.katchaounov@mongodb.com Timour Katchaounov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: