[DOCS-523] Add additional information on query performance when using regular expressions. Created: 14/Sep/12  Updated: 28/Aug/14  Resolved: 28/Aug/14

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: v1.3.10

Type: Improvement Priority: Minor - P4
Reporter: William Zola Assignee: Kay Kim (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Participants:
Days since reply: 11 years, 22 weeks, 5 days ago

 Description   

The documentation on indexing and query performance is incomplete with regard to how different regex queries perform, and does not cover the performance characteristics of queries containing multiple regex queries.



 Comments   
Comment by William Zola [ 14/Sep/12 ]

The performance of multiple regexes depends on the index chosen to perform the query, and the exact fields used by the query.

Here are the rules:

  • MongoDB dynamically chooses a query plan based on the real-time performance of the query. If multiple query plans are possible, MongoDB will run all of them in parallel for a short while, will determine which one is the best-performing query, and will use that plan for the remainder of the query. Once a query plan is in place, it is cached for reuse. Query plans are re-evaluated on a regular basis.
  • MongoDB can only use a single index for a query. If you have individual indexes on fields A, B, and C within a document, and you perform a query using all three fields as part of the condition, then MongoDB will pick only one index for the query, and will scan the documents returned by the most performant index to meet the conditions on the other two fields
  • MongoDB supports compound indexes. Compound indexes allow you to perform indexed queries that use indexes on multiple fields. For example, if you have an index on {a:1,b:1}

    , then you can do a query of the form

    {a:"value", b:"othervalue"}

    and the query will be run entirely using the index: there's no need to scan any documents to find the precise result set

  • A regular expression can only make maximum use of an index if it is a simple prefix query
  • However, a more complex regular expression can make partial use of the index, if the regex refers to an indexed field. For example, given the index above, a query of the form {a:"value", b: /other*lue/ }

    would be able to fully use the index to match all documents where field 'a' has the value 'value'. The query engine would then have to run the regular expression engine across all of the values of 'b' in that preliminary set. If 'b' is indexed, then the query engine can use the value stored in the index, without having to fetch the actual document.

  • On the other hand, if the regex refers to an unindexed field (or one not included in the chosen index) then the query engine must fetch the entire document off of disk in order to execute the query. For example, assuming the index above, a query of the form {a:"value", c:/other*/ }

    would require that all documents that match

    {a:"value"}

    be fetched from disk so that the query engine can filter them. In the general case, this will be considerably slower than only examining the index.

  • In addition to the above, if the regex is being used to query a field which contains an array, it will always fetch the entire document from disk: regex queries of array fields are never performed only using the index.

The point of all this is the following: multiple regex expressions in a single query may or may not impact performance.

  • If you add an additional regex condition on a field that is already being used for a regex query, then performance will be essentially unaffected
  • If you have an existing regex condition on a field that is present in the index being used, and you add an additional regex condition on a field that is also present in the index being used, then performance will also be unaffected
  • If you have an existing regex condition on a field that is NOT present in the index being used, then adding additional regex conditions on other unindexed fields will not affect performance
  • If you have an existing regex condition on a field that is NOT present in the index being used, then adding additional regex conditions on fields which ARE included in the index will actually improve performance
  • If you have an existing regex condition on a field that is present in the index being used, and you add an additional regex condition on a field that is not present in the index, then performance will be decreased.
Generated at Thu Feb 08 07:38:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.