Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-49002

Utilize index for anchored case-insensitive $regex

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 4.2.8, 4.4.0-rc8
    • Fix Version/s: None
    • Component/s: Indexing, Performance, Querying
    • Labels:

      Description

      Anchored regular expressions are really fast with an index. Even if they are not fast enough, the index greatly reduces the number of examined keys and docs. Let's take the following query:

      { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } }
      

      This query correctly uses IXSCAN and results in a scan over the whole index (~1.8m keys in my case).

       

      However, the following query:

      {
        $and: [
          { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
          {
            $or: [
              { field: { $regex: '^z' } },
              { field: { $regex: '^Z' } }
            ]
          }
        ]
      }

      Yields the same results but utilizes the index far better, performing two IXSCANS and combining their results with an OR stage (~65k examined keys).

       

      I thought that I could push this idea even further:

      {
        $and: [
          { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
          {
            $or: [
              { field: { $regex: '^zz' } },
              { field: { $regex: '^zZ' } },
              { field: { $regex: '^Zz' } },
              { field: { $regex: '^ZZ' } }
            ]
          }
        ]
      }
      

      This results in 4 INDEXSCAN}}s and one {{OR (~2.2k examined keys).

       

      To summarize, my point would be to either:

      • Explain to me that these queries are not equivalent. Maybe some crazy Unicode stuff won't work. Even so, the query plan could apply such an optimization only if possible.
      • Somehow incorporate it into the query plan.

       

      If needed, I can provide exact query plans along with their execution stats. Tested on 4.2.8 and 4.4.0-rc.8. I haven't tested older versions, but I guess it'll be the same.

        Attachments

        1. direct.json
          3 kB
        2. or-2.json
          17 kB
        3. or-4.json
          27 kB

          Issue Links

            Activity

              People

              Assignee:
              asya Asya Kamsky
              Reporter:
              radoslaw.miernik@vazco.eu Radosław Miernik
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: