[SERVER-49002] Utilize index for anchored case-insensitive $regex Created: 22/Jun/20  Updated: 07/Jul/20  Resolved: 07/Jul/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Performance, Querying
Affects Version/s: 4.2.8, 4.4.0-rc8
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Radosław Miernik Assignee: Asya Kamsky
Resolution: Duplicate Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File direct.json     File or-2.json     File or-4.json    
Issue Links:
Duplicate
duplicates SERVER-14197 Case insensitive left-anchored regula... Backlog
Participants:

 Description   

Anchored regular expressions are really fast with an index. Even if they are not fast enough, the index greatly reduces the number of examined keys and docs. Let's take the following query:

{ field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } }

This query correctly uses IXSCAN and results in a scan over the whole index (~1.8m keys in my case).

 

However, the following query:

{
  $and: [
    { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
    {
      $or: [
        { field: { $regex: '^z' } },
        { field: { $regex: '^Z' } }
      ]
    }
  ]
}

Yields the same results but utilizes the index far better, performing two IXSCANS and combining their results with an OR stage (~65k examined keys).

 

I thought that I could push this idea even further:

{
  $and: [
    { field: { $regex: '^zzzsyvgT2uQYd9xEB$', $options: 'i' } },
    {
      $or: [
        { field: { $regex: '^zz' } },
        { field: { $regex: '^zZ' } },
        { field: { $regex: '^Zz' } },
        { field: { $regex: '^ZZ' } }
      ]
    }
  ]
}

This results in 4 INDEXSCAN}}s and one {{OR (~2.2k examined keys).

 

To summarize, my point would be to either:

  • Explain to me that these queries are not equivalent. Maybe some crazy Unicode stuff won't work. Even so, the query plan could apply such an optimization only if possible.
  • Somehow incorporate it into the query plan.

 

If needed, I can provide exact query plans along with their execution stats. Tested on 4.2.8 and 4.4.0-rc.8. I haven't tested older versions, but I guess it'll be the same.



 Comments   
Comment by Asya Kamsky [ 07/Jul/20 ]

Hi radoslaw.miernik@vazco.eu - it looks like there's already an enhancement request to do exactly what you describe in SERVER-14197 so I'm closing this ticket as duplicate of the older one.

 

Please follow and upvote the other ticket!

Thanks,

Asya

 

Comment by Carl Champain (Inactive) [ 23/Jun/20 ]

Thank you, radoslaw.miernik@vazco.eu!

We are passing this ticket along to the appropriate team for further review. Updates will be posted on this ticket as they happen.

Comment by Radosław Miernik [ 22/Jun/20 ]

Of course! I've attached them as .json files.

 

 

Comment by Carl Champain (Inactive) [ 22/Jun/20 ]

Hi radoslaw.miernik@vazco.eu,

Thank you for the report.
Can you please provide the explain output for each query?

Kind regards,
Carl

Generated at Thu Feb 08 05:18:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.