Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.8, 4.4.0-rc8
Component/s: Index Maintenance, Performance, Querying
Labels:
- qopt-team

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Anchored regular expressions are really fast with an index. Even if they are not fast enough, the index greatly reduces the number of examined keys and docs. Let's take the following query:

{ field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } }

This query correctly uses IXSCAN and results in a scan over the whole index (~1.8m keys in my case).

However, the following query:

{
  $and: [
    { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
    {
      $or: [
        { field: { $regex: '^z' } },
        { field: { $regex: '^Z' } }
      ]
    }
  ]
}

Yields the same results but utilizes the index far better, performing two IXSCANS and combining their results with an OR stage (~65k examined keys).

I thought that I could push this idea even further:

{
  $and: [
    { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
    {
      $or: [
        { field: { $regex: '^zz' } },
        { field: { $regex: '^zZ' } },
        { field: { $regex: '^Zz' } },
        { field: { $regex: '^ZZ' } }
      ]
    }
  ]
}

This results in 4 INDEXSCAN}}s and one {{OR (~2.2k examined keys).

To summarize, my point would be to either:

Explain to me that these queries are not equivalent. Maybe some crazy Unicode stuff won't work. Even so, the query plan could apply such an optimization only if possible.
Somehow incorporate it into the query plan.

If needed, I can provide exact query plans along with their execution stats. Tested on 4.2.8 and 4.4.0-rc.8. I haven't tested older versions, but I guess it'll be the same.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

direct.json
3 kB
Jun 22 2020 09:31:11 PM UTC
or-2.json
17 kB
Jun 22 2020 09:31:07 PM UTC
or-4.json
27 kB
Jun 22 2020 09:31:07 PM UTC

duplicates

SERVER-14197 Case insensitive left-anchored regular expressions whose first few characters are non-special and ASCII don't need to do full index scans

Backlog

Assignee:: Asya Kamsky
Reporter:: Radosław Miernik
Participants:: Asya Kamsky, Carl Champain, Radosław Miernik
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 22 2020 10:10:47 AM UTC
Updated:: Jul 07 2020 04:10:17 PM UTC
Resolved:: Jul 07 2020 04:10:17 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates