[SERVER-35892] performance regression with lookahead regex Created: 28/Jun/18  Updated: 04/Nov/18  Resolved: 01/Oct/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Roger Gonzalez Assignee: Nick Brewer
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostics.tgz    
Operating System: ALL
Steps To Reproduce:
  • create a collection of documents with an indexed field containing at least 500 characters of words
  • compare search performance of field:/(?=.*one)(?=.*two)(?=.*three) on 3.2 and 3.6.  make sure you try both matches and mis-matches.
  • contrast with $and: [\{field:/(one)/}, \{field:/(two)/}, \{field:/(three)/}] as sanity check
Participants:

 Description   

We recently upgraded our production servers from 3.2 to 3.6, and started noticing huge CPU spikes and long transactions (60s+) on code that used to not cause issues.

Backing collection has about 2000 documents.  The (indexed) description field is a block of up to 512 characters at most.  The normal query has some other filters that narrow the matching set down to about 800 documents, and then this clause is the critical feature (driven by incremental search from web clients):

{{
{
  "description" : /(?=.*first)(?=.*second)/
 }
}}

We add more terms as the user types them.

In our old 3.2 mongo, this query took about 30ms at most.  On 3.6, just two terms takes over 30000ms, and three terms start to be over 70000ms. It's worst when the first term(s) actually match!

Rewriting the query to

{{{
  "$and" : [
    {
       "description" : /(first)/
    },
    {
       "description" : /(second)/
    }
  ]
 }
}}

was slower on 3.2 (from 30ms for the combined regex to 45ms for the $and clauses) but much, much faster on 3.6 (from 30000ms to 70ms).



 Comments   
Comment by Nick Brewer [ 01/Oct/18 ]

argh Since there hasn't been any activity on this ticket in some time, I'm going to close it. Feel free to comment here if you'd like us to reopen this issue.

-Nick

Comment by Nick Brewer [ 14/Sep/18 ]

argh Are you still seeing this issue? If so, we'll need the information previously requested to continue investigating.

Thanks,
-Nick

Comment by Nick Brewer [ 06/Aug/18 ]

argh Sorry for the delay in getting back to you on this. Looking at the diagnostic data, I'm not seeing the sort of spikes in CPU utilization and wait times that you're describing. What are you using to determine the increase? 

It would be useful to see the .explain(true) output for both of these queries, and any log messages you're seeing when the queries are run.

Thanks,
Nick

Comment by Roger Gonzalez [ 29/Jun/18 ]

Done. This was recorded off the primary, let me know if you need anything off a secondary.

Comment by Nick Brewer [ 29/Jun/18 ]

Hi argh

Would you please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket?

Thank you,
Nick

Comment by Roger Gonzalez [ 28/Jun/18 ]

gah, apologies for the mangled formatting!

Generated at Thu Feb 08 04:41:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.