Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35892

performance regression with lookahead regex

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • Labels:
      None
    • ALL
    • Hide
      • create a collection of documents with an indexed field containing at least 500 characters of words
      • compare search performance of field:/(?=.*one)(?=.*two)(?=.*three) on 3.2 and 3.6.  make sure you try both matches and mis-matches.
      • contrast with $and: [\{field:/(one)/}, \{field:/(two)/}, \{field:/(three)/}] as sanity check
      Show
      create a collection of documents with an indexed field containing at least 500 characters of words compare search performance of field:/(?=.*one)(?=.*two)(?=.*three) on 3.2 and 3.6.  make sure you try both matches and mis-matches. contrast with $and: [\{field:/(one)/}, \{field:/(two)/}, \{field:/(three)/}] as sanity check

      We recently upgraded our production servers from 3.2 to 3.6, and started noticing huge CPU spikes and long transactions (60s+) on code that used to not cause issues.

      Backing collection has about 2000 documents.  The (indexed) description field is a block of up to 512 characters at most.  The normal query has some other filters that narrow the matching set down to about 800 documents, and then this clause is the critical feature (driven by incremental search from web clients):

      {{
      {
        "description" : /(?=.*first)(?=.*second)/
       }
      }}
      

      We add more terms as the user types them.

      In our old 3.2 mongo, this query took about 30ms at most.  On 3.6, just two terms takes over 30000ms, and three terms start to be over 70000ms. It's worst when the first term(s) actually match!

      Rewriting the query to

      {{{
        "$and" : [
          {
             "description" : /(first)/
          },
          {
             "description" : /(second)/
          }
        ]
       }
      }}
      

      was slower on 3.2 (from 30ms for the combined regex to 45ms for the $and clauses) but much, much faster on 3.6 (from 30000ms to 70ms).

            Assignee:
            nick.brewer Nick Brewer
            Reporter:
            argh Roger Gonzalez
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: