Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10991

$in for large array of regexps is slow

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • 2.4.5
    • Querying
    • Query Execution
    • ALL
    • Hide

      In the attached script run create at least once, then run set qn and run query_regexp for various values of qn.

      Show
      In the attached script run create at least once, then run set qn and run query_regexp for various values of qn.

    Description

      Observed very long running queries involving $in matching a large array of regexps:

      {
                      "foo" : {
                              "$in" : [
                                      /^1000000000/,
                                      /^1000000001/,
                                      /^1000000002/,
                                      /^1000000003/,
                                      /^1000000004/,
                                      /^1000000005/,
                                      /^1000000006/,
                                     ...
      				/^1000009998/,
      				/^1000009999/
      			]
      		}
      }

      Attached script reproduces the issue. Time to run the query is quadratic or worse in the number of regexps; in following table n is number of regexps. Note that this run time is independent of the number of documents in the collection; note for example that in the reproducing script there is only a single document in the collection, and it does not match any of the regexps.

      n time
      5000 1.6 s
      10000 6.6s
      20000 34.0s

      Simple gdb-based sampling profiler shows a single thread spending most of its time for the duration of the query in code that appears to be related to turning the regexps (all of which start with ^) into a set of ranges. Could this algorithm be quadratic in the number of regexps?

      Here's the portion of the call tree showing where the thread is spending its time; first column is average # of threads in that node of the call tree.

       
      1.00        mongo::PortMessageServer::handleIncomingMsg
      1.00          mongo::MyMessageHandler::process
      1.00            mongo::assembleResponse
      1.00              mongo::runQuery
      1.00                mongo::queryWithQueryOptimizer
      1.00                  mongo::NamespaceDetailsTransient::getCursor
      1.00                    mongo::CursorGenerator::generate
      0.99                      mongo::CursorGenerator::setMultiPlanScanner
      0.99                        mongo::MultiPlanScanner::make
      0.99                          mongo::MultiPlanScanner::init
      0.99                            mongo::FieldRangeSet::FieldRangeSet
      0.99                              mongo::FieldRangeSet::init
      0.99                                mongo::FieldRangeSet::handleMatchField
      0.99                                  mongo::FieldRangeSet::handleOp
      0.99                                    mongo::FieldRangeSet::intersectMatchField
      0.99                                      mongo::FieldRange::FieldRange
      0.96                                        mongo::FieldRange::operator|=

      Note that writes are blocked while this is happening.

      Also please note that this has nothing to do with indexes: all the time is being spent preparing the query for execution, as per the above profile, and removing the index (which was in fact used) in the attached reproducing script does not change the behavior (as expected, since in the reproducing script no documents are in fact being matched.)

      Similar query using a large number of strings in a $in match did not have the same issue (see query_string in script).

      Attachments

        1. regexp.js
          1 kB
        2. regexp2.js
          2 kB

        Activity

          People

            backlog-query-execution Backlog - Query Execution
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: